3.9 Chapter Summary

The source code of a program is simply a string of characters. After comments are purged from the string, scanning (or lexical analysis) partitions the string into the most atomic lexical units based on some delimiter (usually whitespace) and produces a list of these lexical units. The scanner, which models the regular grammar that defines the tokens of the programming language, then determines the validity of these lexical units. If all of the lexical units are lexemes (i.e., valid), the scanner returns a list of tokens—which is input to a parser. The parser, which models the context-free grammar that defines the structure or syntax of the language, determines whether the program is syntactically valid. Parsing (or syntactic analysis) determines whether a list of tokens is in the correct order and, if so, often structures this list into a parse tree. If the parser can construct a parse tree from the list of tokens, the program is syntactically valid; otherwise, it is not. If the program is valid, the result of parsing is typically a parse (or abstract-syntax) tree.

A variety of approaches may be used to build a parser. Each approach has requirements for the form of the grammar used and often offers complementary advantages and disadvantages. Parsers can be generally classified as one of two types: top-down or bottom-up. A top-down parser builds a parse tree starting at the root (or start symbol of the grammar), while a bottom-up parser starts from the leaves. There are two types of top-down parsers: table-driven and recursive descent. A recursive-descent parser is a type of top-down parser that uses functions—one per non-terminal—and the internal run-time stack of activation records for function calls to determine the validity of input strings. The beauty of a recursive-descent parser is that the source code mirrors the grammar. Moreover, the parse table is implicit/embedded in the function definitions constituting the parser code. Thus, a recursive-descent parser is both readable and modifiable. Bottom-up parsing involves use of a shift-reduce method, whereby a rightmost derivation of the input string is constructed in reverse (i.e., the bottom-up nature refers to starting with the terminals of the string and working backward toward the start symbol of the grammar). There are also generators for bottom-up, shift-reduce parsers. The lex tool is a scanner generator for C; the yacc tool is a parser generator for C. In addition, scanner/parser generators are available for a variety of programming languages, including Python (PLY) and Java (e.g., ANTLR). A scanner and a parser constitute the syntactic component (sometimes called the front end) of a programming language implementation (e.g., interpreter or compiler), which we discuss in Chapter 4.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.136.63