Chapter 10. Productions

A production is a pattern and an optional constructor. Each production is a scope. The pattern may establish variable bindings which can be referenced in the constructor. A production can be qualified with a precedence that is used to resolve a tie if two productions match the same text (see Section 10.4.1).

ProductionDeclaration:
  ProductionPrecedenceopt PatternDeclaration Constructoropt
Constructor
  => TermConstructor
ProductionPrecedence:
  precedence IntegerLiteral :

Pattern Declaration

A pattern declaration is a sequence of term declarations or the built-in pattern empty which matches "".

PatternDeclaration:

 

empty

 

TermDeclarationsopt

TermDeclarations:

 

TermDeclaration

 

TermDeclarations TermDeclaration

Term Declaration

A term declaration consists of a pattern expression with an optional variable binding, precedence and attributes. The built-in term error is used for error recovery and discussed in Section 9.1.6.

TermDeclaration:
  error
  Attributesopt TermPrecedenceopt VariableBindingopt TextPatternExpression
VariableBinding:
  Name :
TermPrecedence:
  left ( IntegerLiteral )
  right ( IntegerLiteral )

A variable associates a name with the output from a term which can be used in the constructor. Precedence is discussed in Section 10.4.2.

The error term is used in conjunction with the checkpoint rule modifier to facilitate error recovery. This is described in Section 9.1.6.

Constructors

A term constructor is the syntax for defining the output of a production. A node in a term constructor can be:

  • An atom consisting of

    • A literal

    • A reference to another term

    • An operation on a reference

  • An ordered collection of successors with an optional label

  • An unordered collection of successors with an optional label

The following grammar mirrors this structure.

TermConstructor:
  TopLevelNode
Node:
  Atom
  OrderedTerm
  UnorderedTerm
TopLevelNode:
  TopLevelAtom
  OrderedTerm
  UnorderedTerm
Nodes:
  Node
  Nodes , Node
OrderedTerm:
  Labelopt  [  Nodesopt ]
UnorderedTerm:
  Labelopt {  Nodesopt }
Label:
  Identifier
  id ( Atom )
Atom:
  TopLevelAtom
  valuesof ( VariableReference )
TopLevelAtom:
  TextLiteral
  DecimalLiteral
  LogicalLiteral
  IntegerLiteral
  NullLiteral
  VariableReference
  labelof ( VariableReference )
VariableReference:
  Identifier

Each production defines a scope. The variables referenced in a constructor must be defined within the same production’s pattern. Variables defined in other productions in the same rule cannot be referenced. The same variable name can be used across alternatives in the same rule.

Consider three alternatives for encoding the output of the same production. First, the default constructor (see Section 10.3.2):

module Expression {
    language Expression {
        token Digits = ("0".."9")+;
        syntax Main = E;
        syntax E
          = Digits
          | E "+" E ;
    }
}

Processing the text "1+2" yields:

Main[E[E[1], +, E[2]]]

This output reflects the structure of the grammar and may not be the most useful form for further processing. The second alternative cleans the output up considerably:

module Expression {
    language Expression {
        token Digits = ("0".."9")+;
        syntax Main
          = e:E => e;
        syntax E
          = d:Digits => d
          | l:E "+" r:E => Add[l,r] ;
    }
}

Processing the text "1+2" with this language yields:

Add[1, 2]

This grammar uses three common patterns:

  • Productions with a single term are passed through. This is done for the single production in Main and the first production in E.

  • A label, Add, is used to designate the operator.

  • Position is used to distinguish the left and right operand.

The third alternative uses a record like structure to give the operands names:

module Expression {
    language Expression {
        token Digits = ("0".."9")+;
        syntax Main
          = e:E => e;
        syntax E
          = d:Digits => d
          | l:E "+" r:E => Add{Left{l},Right{r}} ;
    }
}

Processing the text "1+2" with this language yields:

Add{Left{1}, Right{2}}

Although somewhat more verbose than the prior alternative, this output does not rely on ordering and forces consumers to explicitly name Left or Right operands. Although either option works, this has proven to be more flexible and less error prone.

Constructor Operators

Constructor operators allow a constructor to use a variable reference as a label, extract the successors of a variable reference, or extract the label of a variable reference.

Consider generalizing the previous example to support multiple operators. This could be done by adding a new production for each operator -, *, /, ^. Alternatively a single rule can be established to match these operators and the output of that rule can be used as a label using id:

module Expression {
    language Expression {
        token Digits = ("0".."9")+;
        syntax Main
          = e:E => e;
        syntax Op
          = "+" => "Add"
          | "-" => "Subtract"
          | "*" => "Multiply"
          | "/" => "Divide" ;
        syntax E
          = d:Digits => d
          | l:E o:Op r:E => id(o){Left[l],Right[r]} ;
    }
}

Processing the text "1+2" with this language yields the same result as above. Processing "1 / 2" yields:

Divide{Left{1}, Right{2}}

This language illustrates the id operator. See Section 10.4.2 for a more realistic expression grammar.

The valuesof operator extracts the successors of a variable reference. It is used to flatten nested output structures. Consider the language:

module Digits {
    language Digits {
        syntax Main = DigitList ;
        token Digit = "0".."9";
        syntax DigitList
          = Digit
          | DigitList "," Digit ;
    }
}

Processing the text "1, 2, 3" with this language yields:

Main[
  DigitList[
    DigitList[
      DigitList[
        1
      ],
      ,,
      2
    ],
    ,,
    3
  ]
]

The following grammar uses valuesof and the pass through pattern above to simplify the output:

module Digits {
    language Digits {
        syntax Main = dl:DigitList => dl ;
        token Digit = "0".."9";
        syntax DigitList
          = d:Digit => DigitList[d]
          | dl:DigitList "," d:Digit => DigitList[valuesof(dl),d] ;
    }
}

Processing the text "1, 2, 3" with this language yields:

DigitList[1, 2, 3]

This output represents the same information more concisely.

Default Constructor

If a constructor is not defined for a production, the language process defines a default constructor. For a given production, the default projection is formed as follows:

  1. The label for the result is the name of the production’s rule.

  2. The successors of the result are an ordered sequence constructed from each term in the pattern.

  3. * and ? create an unlabeled sequence with the elements.

  4. ( ) results in an anonymous definition.

    1. If it contains constructors (a:A => a), then the output is the output of the constructor.

    2. If there are no constructors, then the default rule applied on the anonymous definition and the output is enclosed in square brackets [ A’s result ]

  5. Token rules do not permit a constructor to be specified and output text values.

  6. Interleave rules do not permit a constructor to be specified and do not produce output.

Consider the following language:

module ThreeDigits {
    language ThreeDigits {
        token Digit = "0".."9";
        syntax Main
          = Digit Digit Digit ;
    }
}

Given the text "123" the default output of the language processor follows:

Main[
  1,
  2,
  3
]

Precedence

The Mg language processor is tolerant of such ambiguity as it is recognizing subsequences of text. However, it is an error to produce more than one output for an entire text value. Precedence qualifiers on productions or terms determine which of the several outputs should be returned.

Production Precedence

Consider the classic dangling else problem as represented in the following language:

module IfThenElse {
    language IfThenElse {
        syntax Main = S;
        syntax S
          = empty
          | "if" E "then" S
          | "if" E "then" S "else" S;
        syntax E = empty;
        interleave Whitespace = " ";
    }
}

Given the input “if then if then else”, two different output are possible. Either the else binds to the first if-then:

if
then
    if
    then
else

Or it binds to the second if-then:

if
then
    if
    then
    else

The following language produces the output immediately above, binding the else to the second if-then.

module IfThenElse {
    language IfThenElse {
        syntax Main = S;
        syntax S
          = empty
          | precedence 2: "if" E "then" S
          | precedence 1: "if" E "then" S "else" S;
        syntax E = empty;
        interleave Whitespace = " ";
    }
}

Switching the precedence values produces the first output.

Term Precedence

Consider a simple expression language which recognizes:

2 + 3 + 4
5 * 6 * 7
2 + 3 * 4
2 ^ 3 ^ 4

The result of these expressions can depend on the order in which the operators are reduced. 2 + 3 + 4 yields 9 whether 2 + 3 is evaluated first or 3 + 4 is evaluated first. Likewise, 5 * 6 * 7 yields 210 regardless of the order of evaluation.

However, this is not the case for 2 + 3 * 4. If 2 + 3 is evaluated first yielding 5, then 5 * 4 yields 20. While if 3 * 4 is evaluated first yielding 12, then 2 + 12 yields 14. This difference manifests itself in the output of the following grammar:

module Expression {
    language Expression {
        token Digits = ("0".."9")+;
        syntax Main = e:E => e;
        syntax E
          = d:Digits => d
          | "(" e:E ")" => e
          | l:E "^" r:E => Exp[l,r]
          | l:E "*" r:E => Mult[l,r]
          | l:E "+" r:E => Add[l,r];
        interleave Whitespace = " ";
    }
}

" 2 + 3 * 4" can result in two outputs:

Mult[Add[2, 3], 4]
Add[2, Mult[3, 4]]

According to the rules we all learned in school, the result of this expression is 14 because multiplication is performed before addition. This is expressed in Mg by assigning "*" a higher precedence than "+". In this case the result of an expression changed with the order of evaluation of different operators.

The order of evaluation of a single operator can matter as well. Consider 2 ^ 3 ^ 4. This could result in either 8 ^ 4 or 2 ^ 81. In term of output, there are two possibilities:

Exp[Exp[2, 3], 4]
Exp[2, Exp[3, 4]]

In this case the issue is not which of several different operators to evaluate first but which in a sequence of operators to evaluate first, the leftmost or the rightmost. The rule in this case is less well established but most languages choose to evaluate the rightmost "^" first yielding 2 ^ 81 in this example.

The following grammar implements these rules using term precedence qualifiers. Term precedence qualifiers may only be applied to literals or references to token rules.

module Expression {
    language Expression {
        token Digits = ("0".."9")+;
        syntax Main = E;
        syntax E
          = d:Digits => d
          | "(" e:E ")" => e
          | l:E right(3) "^" r:E => Exp[l,r]
          | l:E left(2) "*" r:E => Mult[l,r]
          | l:E left(1) "+" r:E => Add[l,r];
        interleave Whitespace = " ";
    }
}

"^" is qualified with right(3). right indicates that the rightmost in a sequence should be grouped together first. 3 is the highest precedence, so "^" will be grouped most strongly.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.207.123