4.5 Other Parser Generators

4.5.1 Bison

Bison is a parser generator by GNU organization, similar to yacc, written mainly by Robert Corbett and Richard Stallman. It is generally upward compatible with input files designed for yacc.

Input files should follow the yacc convention of ending in .y. Unlike yacc, the generated files do not have fixed names, but instead use the prefix of the input file. Moreover, if you need to put C++ code in the input file, you can end the name by a C++ -like extension (.ypp or .y++), then bison will follow your extension to name the output file (.cpp or .c++). For instance, a grammar description file named parse .yxx would produce the generated parser in a file named parse. tab.cxx, instead of yacc's y.tab.c.

Bison has several options to control the types of outputs, including a graph of the FSM. The latest version (2.4.3 – 05-Aug-2010) has several interesting extensions beyond the original yacc. We describe here some of them.

General LR Parse – GLR

This seems to be the most interesting extension to yacc. As we know yacc is a generator for an LALR(1) parser. It can also handle in some cases look-ahead by 2 and implement partial LR(1) behaviour. GNU Bison adds a new capability – “parallel” search of reductions to resolve conflicts. We quote from the Bison documentation:

In some grammars, Bison's standard LALR(1) parsing algorithm cannot decide whether to apply a certain grammar rule at a given point. That is, it may not be able to decide (on the basis of the input read so far) which of two possible reductions (applications of a grammar rule) applies, or whether to apply a reduction or read more of the input and apply a reduction later in the input. These are known respectively as Reduce/Reduce conflicts and Shift/Reduce conflicts.

To use a grammar that is not easily modified to be LALR(1), a more general parsing algorithm is sometimes necessary. If you include %glr-parser among the Bison declarations in your file, the result is a generalized LR (GLR) parser. These parsers handle Bison grammars that contain no unresolved conflicts (i.e. after applying precedence declarations) identically to LALR(1) parsers. However, when faced with unresolved Shift/Reduce and Reduce/Reduce conflicts, GLR parsers use the simple expedient of doing both, effectively cloning the parser to follow both possibilities. Each of the resulting parsers can again split, so that at any given time, there can be any number of possible parses being explored. The parsers proceed in lockstep; that is, all of them consume (shift) a given input symbol before any of them proceed to the next. Each of the cloned parsers eventually meets one of two possible fates: either it runs into a parsing error, in which case it simply vanishes, or it merges with another parser, because the two of them have reduced the input to an identical set of symbols.

During the time that there are multiple parsers, semantic actions are recorded, but not performed. When a parser disappears, its recorded semantic actions disappear as well and are never performed. When a reduction makes two parsers identical, causing them to merge, Bison records both sets of semantic actions. Whenever the last two parsers merge, reverting to the single-parser case, Bison resolves all the outstanding actions either by precedences given to the grammar rules involved or by performing both actions, and then calling a designated user-defined function on the resulting values to produce an arbitrary merged result.

Parser Written in C++ and Java

Bison can generate parsers written in C++ and Java, with a few limitations at present.

4.5.2 Parse::Yapp

As Perl language has powerful string manipulation and regular expression handling facilities, it is a good candidate for implementing a Parser generator. One of the more well-known ones is Parse::Yapp, which is a Perl extension for generating and using LALR parsers. Parse::Yapp (Yet Another Perl Parser compiler) is a collection of modules that let you generate and use yacc-like thread safe (re-entrant) parsers with Perl object-oriented interface.

The script yapp is a front-end to the Parse::Yapp module and lets you easily create a Perl object-oriented parser from an input grammar file.

The grammar file for Yapp has a syntax similar to the grammar file for yacc.

4.5.3 ANTLR

Another Tool for Language Recognition

ANTLR is the name of a parser generator that uses LL(*) parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989 and is under active development. Its mentor is Professor Terence Parr of the University of San Francisco.

ANTLR rules are expressed in a format deliberately similar to EBNF instead of the regular expression syntax employed by other parser generators.

At the moment, ANTLR supports generating code in the following languages: C, C++, Java, Python, C#, Objective-C. ANTLR-3 is a free software, published under a three-clause BSD license.

For an introduction, see the ANTLR tutorial at the University of Birmingham. For background on the theory, see articles from the ANTLR pages, e.g. an ANTLR journal paper.

Several plugins have been developed for the Eclipse development environment to support the ANTLR grammar. There is ANTLR Studio, a proprietary product, as well as the ANTLR-2 and -3 plug-ins for Eclipse hosted on Source-Forge.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.97.75