Chapter 9. Text Pattern Expressions

Text pattern expressions perform operations on the sets of possible text values that one or more terms recognize.

Primary Expressions

A primary expression can be:

  • A text literal

  • A reference to a syntax or token rule

  • An expression indicating a repeated sequence of primary expressions of a specified length

  • An expression indicating any of a continuous range of characters

  • An inline sequence of pattern declarations

The following grammar reflects this structure.

Primary:

 

ReferencePrimary

 

TextLiteral

 

RepetitionPrimary

 

CharacterClassPrimary

 

InlineRulePrimary

 

AnyPrimary

Character Class

A character class is a compact syntax for a range of continuous characters. This expression requires that the text literals be of length 1 and that the Unicode offset of the right operand be greater than that of the left.

CharacterClassPrimary:

 

TextLiteral .. TextLiteral

The expression "0".."9" is equivalent to:

"0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"

References

A reference primary is the name of another rule possibly with arguments for parameterized rules. All rules defined within the same language can be accessed without qualification. The protocol to access rules defined in a different language within the same module are defined in Section 12.2. The protocol to access rules defined in a different module are defined in Section 13.3.

ReferencePrimary:
  GrammarReference
GrammarReference:
  Identifier
  GrammarReference  . Identifier
  GrammarReference  . Identifier ( TypeArguments  )
  Identifier ( TypeArguments  )
TypeArguments:
  PrimaryExpression
  TypeArguments  , PrimaryExpression

Note that whitespace between a rule name and its arguments list is significant to discriminate between a reference to a parameterized rule and a reference without parameters and an inline rule. In a reference to a parameterized rule, no whitespace is permitted between the identifier and the arguments.

Repetition Operators

The repetition operators recognize a primary expression repeated a specified number of times. The number of repetitions can be stated as a (possibly open) integer range or using one of the Kleene operators, ?, +, *.

RepetitionPrimary:
  Primary Range
  Primary CollectionRanges
Range:
  ?
  *
  +
CollectionRanges:
  #IntegerLiteral
  #IntegerLiteral .. IntegerLiteralopt

The left operand of .. must be greater than zero and less than the right operand of .., if present.

"A"#5           recognizes exactly 5 "A"s        "AAAAA"
"A"#2..4        recognizes from 2 to 4 "A"s      "AA", "AAA", "AAAA"
"A"#3..         recognizes 3 or more "A"s        "AAA", "AAAA", "AAAAA", . . .

The Kleene operators can be defined in terms of the collection range operator:

"A"? is equivalent to "A"#0..1
"A"+ is equivalent to "A"1..
"A"* is equivalent to "A"#0..

Inline Rules

An inline rule is a means to group pattern declarations together as a term.

InlineRulePrimary:   (  ProductionDeclarations  )

An inline rule is typically used in conjunction with a range operator:

"A" ("," "A")* recognizes 1 or more "A"s separated by commas.

Although syntactically legal, variable bindings within inline rules are not accessible within the constructor of the containing production. Inline rules are described further in Section 11.4.

Any

The any term is a wildcard that matches any text value of length 1.

Any:
  any

"1", "z", and "*" all match any.

Error

The error production enables error recovery. Consider the following example:

module HelloWorld {
    language HelloWorld {
        syntax Main
          = HelloList;
        token Hello
          = "Hello";
        checkpoint syntax HelloList
          = Hello
          | HelloList "," Hello
          | HelloList "," error;
    }
}

The language recognizes the text "Hello,Hello,Hello" as expected and produces the following default output:

Main[
  HelloList[
    HelloList[
      HelloList[
        Hello
      ],
      ,,
      Hello
    ],
    ,,
    Hello
  ]
]

The text "Hello,hello,Hello" is not in the language because the second "h" is not capitalized (and case sensitivity is true). However, rather than stop at "h", the language processor matches "h" to the error token, then matches "e" to the error token, and so forth. Until it reaches the comma. At this point the text conforms to the language and normal processing can continue. The language process reports the position of the errors and produces the following output:

Main[
  HelloList[
    HelloList[
      HelloList[
        Hello
      ],
      error["hello"],
    ],
    ,,
    Hello
  ]
]

Hello occurs twice instead of three times as above and the text the error token matched is returned as error["hello"].

Term Operators

A primary term expression can be thought of as the set of possible text values that it recognizes. The term operators perform the standard set difference, intersection, and negation operations on these sets. (Pattern declarations perform the union operation with |.)

TextPatternExpression:
  Difference
Difference:
  Intersect
  Difference  - Intersect
Intersect:
  Inverse
  Intersect  & Inverse
Inverse:
  Primary
  ^ Primary

Inverse requires every value in the set of possible text values to be of length 1.

  • ("11" | "12") – ("12" | "13") recognizes "11".

  • ("11" | "12") & ("12" | "13") recognizes "12".

  • ^("11" | "12") is an error.

  • ^("1" | "2") recognizes any text value of length 1 other than "1" or "2".

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.188.121