Defining a TOY language

Before implementing a lexer and parser, the syntax and grammar of the language need to be determined first. In this chapter, a TOY language is used to demonstrate how a lexer and a parser can be implemented. The purpose of this recipe is to show how a language is skimmed through. For this purpose, the TOY language to be used is simple but meaningful.

A language typically has some variables, some function calls, some constants, and so on. To keep things simple, our TOY language in consideration has only numeric constants of 32-bit Integer type A, a variable that need not declare its type (like Python, in contrast to C/C++/Java, which require a type declaration) in the TOY language.

How to do it…

The grammar can be defined as follows (the production rules are defined below, with non-terminals on Left Hand Side (LHS) and a combination of terminals and non-terminals on Right Hand Side (RHS); when LHS is encountered, it yields appropriate RHS defined in the production rule):

  1. A numeric expression will give a constant number:
    numeric_expr := number
  2. A parenthesis expression will have an expression in between an opening and a closing bracket:
    paran_expr := '(' expression ')'
  3. An identifier expression will either yield an identifier or a function call:
    identifier_expr
    := identifier
    := identifier '('expr_list ')'
  4. If identifier _expr is a function call, it will either have no arguments or list of arguments separated by a comma:
    expr_list
    := (empty)
    := expression (',' expression)*
  5. There will be some primary expression, the starting point of the grammar, which may yield an identifier expression, a numeric expression, or a parenthesis expression:
    primary := identifier_expr
    :=numeric_expr
    :=paran_expr
  6. An expression can lead to a binary expression:
    expression := primary binoprhs
  7. A binary operation with RHS can yield combinations of binary operators and expressions:
    binoprhs := ( binoperator primary )*
    binoperators := '+'/'-'/'*'/'/'
  8. A function declaration can have grammar as follows:
    func_decl := identifier '(' identifier_list ')'
    identifier_list := (empty)
                            := (identifier)*
  9. A function definition is distinguished by a def keyword followed by a function declaration and an expression that defines its body:
    function_defn := 'def' func_decl expression
  10. Finally, there will be a top level expression that will yield an expression:
    toplevel_expr := expression  

An example of the TOY language based on the previously defined grammar can be written as follows:

def foo (x , y)
x +y * 16

Since we have defined the grammar, the next step is to write a lexer and parser for it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.71.94