Chapter 8.  Building a Parser and Interpreter for a Custom Language

Extensibility and adaptability are often the required features in enterprise applications. Often, it is useful and practical-or even an actual feature requirement by users-to change an application's behavior and business rules at runtime. Imagine, for example, an e-commerce application in which sales representatives can configure business rules themselves; for example, when the system should offer free shipping for a purchase or should apply a certain discount when some special conditions are met (offer free shipping when the purchase amount exceeds 150 Euros , and the customer has already made two or more purchases in the past or has been a customer for more than a year).

By experience, rules such as these tend to get ridiculously complex (offer a discount when the customer is male and is older than 35 years and has two kids and a cat named Mr. Whiskers and placed the purchase order on a cloudless full-moon night) and may change frequently. For this reason, as a developer, you might actually be glad to offer your user a possibility to configure rules such as these for themselves, instead of having to update, test, and redeploy the application every time one of these rules changes. A feature like this is called end-user development and is often implemented using  domain-specific languages.

Domain-specific languages are languages that are tailored for one specific application domain (in contrast to general-purpose languages, such as C, Java or-you guessed it-PHP). In this chapter, we will build our own parser for a small expression language that can be used to configure business rules in enterprise applications.

For this, we'll need to recapitulate how a parser works and how formal languages are described using formal grammars.

How interpreters and compilers work

Interpreters and compilers read programs that are formulated in a programming language. They either execute them directly (interpreters) or first convert them into a machine language or another programming language (compilers). Both interpreters and compilers usually have (among others) two components called lexer and parser.

How interpreters and compilers work

This is a basic architecture of a compiler or interpreter

An interpreter may omit the code generation and run the parsed program directly without a dedicated compilation step.

The lexer (also called scanner or tokenizer) dissects an input program into its smallest possible parts, the so-called tokens. Each token consists of a token class (for example, numerical value or variable identifier) and the actual token contents. For example, a lexer for a calculator given the input string 2 + (3 * a) might generate the following list of tokens (each having a token class and value):

  1. Number ("2")
  2. Addition operator ("+")
  3. Opening bracket ("(")
  4. Number ("3")
  5. Multiplication operator ("*")
  6. Variable identifier ("a")
  7. Closing bracket (")")

In the next step, the parser takes the token streams and tries to derive the actual program structure from this stream. For this, the parser needs to be programmed with a set of rules that describe the input language, a grammar. In many cases, a parser generates a data structure that represents the input program in a structured tree; the so-called syntax tree. For example, the input string 2 + (3 * a) generates the following syntax tree:

How interpreters and compilers work

An Abstract Syntax Tree (AST) that can be generated from the expression 2 + (3 * a)

Note that there are programs that will pass the lexical analysis, but in the following step, they are recognized as syntactically wrong by the parser. For example, the input string called 2 + ( 1 would pass the lexer (and generate a token list such as {Number(2), Addition Operator, Opening bracket, Number(1)}), but it is obviously syntactically wrong as the opening bracket does not make any sense without a matching closing bracket (assuming that the parser uses the grammar universally recognized for mathematical expressions; in other grammars, 2+(1 might actually be a syntactically valid expression)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.146.37.250