Basic Form of a Pyparsing Program

The prototypical pyparsing program has the following structure:

  • Import names from pyparsing module

  • Define grammar using pyparsing classes and helper methods

  • Use the grammar to parse the input text

  • Process the results from parsing the input text

Import Names from Pyparsing

In general, using the form from pyparsing import * is discouraged among Python style experts. It pollutes the local variable namespace with an unknown number of new names from the imported module. However, during pyparsing grammar development, it is hard to anticipate all of the parser element types and other pyparsing-defined names that will be needed, and this form simplifies early grammar development. After the grammar is mostly finished, you can go back to this statement and replace the * with the list of pyparsing names that you actually used.

Define the Grammar

The grammar is your definition of the text pattern that you want to extract from the input text. With pyparsing, the grammar takes the form of one or more Python statements that define text patterns, and combinations of patterns, using pyparsing classes and helpers to specify these individual pieces. Pyparsing allows you to use operators such as +, |, and ^ to simplify this code. For instance, if I use the pyparsing Word class to define a typical programming variable name consisting of a leading alphabetic character with a body of alphanumeric characters or underscores, I would start with the Python statement:

identifier = Word(alphas, alphanums+'_')

I might also want to parse numeric constants, either integer or floating point. A simplistic definition uses another Word instance, defining our number as a "word" composed of numeric digits, possibly including a decimal point:

number = Word(nums+".")

From here, I could then define a simple assignment statement as:

assignmentExpr = identifier + "=" + (identifier | number)

and now I have a grammar that will parse any of the following:

a = 10
a_2=100
pi=3.14159
goldenRatio = 1.61803
E = mc2

In this part of the program you can also attach any parse-time callbacks (or parse actions) or define names for significant parts of the grammar to ease the job of locating those parts later. Parse actions are a very powerful feature of pyparsing, and I will also cover them later in detail.

Use the Grammar to Parse the Input Text

In early versions of pyparsing, this step was limited to using the parseString method, as in:

assignmentTokens = assignmentExpr.parseString("pi=3.14159")

to retrieve the matching tokens as parsed from the input text.

The options for using your pyparsing grammar have increased since the early versions. With later releases of pyparsing, you can use any of the following:

parseString

Applies the grammar to the given input text.

scanString

Scans through the input text looking for matches; scanString is a generator function that returns the matched tokens, and the start and end location within the text, as each match is found.

searchString

A simple wrapper around scanString, returning a list containing each set of matched tokens within its own sublist.

transformString

Another wrapper around scanString, to simplify replacing matched tokens with modified text or replacement strings, or to strip text that matches the grammar.

For now, let's stick with parseString, and I'll show you the other choices in more detail later.

Process the Results from Parsing the Input Text

Of course, the whole point of using the parser in the first place is to extract data from the input text. Many parsing tools simply return a list of the matched tokens to be further processed to interpret the meaning of the separate tokens. Pyparsing offers a rich object for results, called ParseResults. In its simplest form, ParseResults can be printed and accessed just like a Python list. For instance, continuing our assignment expression example, the following code:

assignmentTokens = assignmentExpr.parseString("pi=3.14159")
print assignmentTokens

prints out:

['pi', '=', '3.14159']

But ParseResults can also support direct access to individual fields within the parsed text, if results names were assigned as part of the grammar definition. By enhancing the definition of assignmentExpr to use results names (such as lhs and rhs for the left- and righthand sides of the assignment), we can access the fields as if they were attributes of the returned ParseResults:

assignmentExpr = identifier.setResultsName("lhs") + "=" + 
                                           (identifier | number).setResultsName("rhs")
assignmentTokens = assignmentExpr.parseString( "pi=3.14159" )
print assignmentTokens.rhs, "is assigned to", assignmentTokens.lhs

prints out:

3.14159 is assigned to pi

Now that the introductions are out of the way, let's move on to some detailed examples.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.177.39