Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Basic Form of a Pyparsing Program

The prototypical pyparsing program has the following structure:

Import names from pyparsing module
Define grammar using pyparsing classes and helper methods
Use the grammar to parse the input text
Process the results from parsing the input text

Import Names from Pyparsing

In general, using the form from pyparsing import * is discouraged among Python style experts. It pollutes the local variable namespace with an unknown number of new names from the imported module. However, during pyparsing grammar development, it is hard to anticipate all of the parser element types and other pyparsing-defined names that will be needed, and this form simplifies early grammar development. After the grammar is mostly finished, you can go back to this statement and replace the * with the list of pyparsing names that you actually used.

Define the Grammar

The grammar is your definition of the text pattern that you want to extract from the input text. With pyparsing, the grammar takes the form of one or more Python statements that define text patterns, and combinations of patterns, using pyparsing classes and helpers to specify these individual pieces. Pyparsing allows you to use operators such as +, |, and ^ to simplify this code. For instance, if I use the pyparsing Word class to define a typical programming variable name consisting of a leading alphabetic character with a body of alphanumeric characters or underscores, I would start with the Python statement:

identifier = Word(alphas, alphanums+'_')

I might also want to parse numeric constants, either integer or floating point. A simplistic definition uses another Word instance, defining our number as a "word" composed of numeric digits, possibly including a decimal point:

number = Word(nums+".")

From here, I could then define a simple assignment statement as:

assignmentExpr = identifier + "=" + (identifier | number)

and now I have a grammar that will parse any of the following:

a = 10
a_2=100
pi=3.14159
goldenRatio = 1.61803
E = mc2

In this part of the program you can also attach any parse-time callbacks (or parse actions) or define names for significant parts of the grammar to ease the job of locating those parts later. Parse actions are a very powerful feature of pyparsing, and I will also cover them later in detail.

Best Practice: Start with a BNF

Before just diving in and writing a bunch of stream-of-consciousness Python code to represent your grammar, take a moment to put down on paper a description of the problem. Having this will:

Help clarify your thoughts on the problem
Guide your parser design
Give you a checklist of things to do as you implement your parser
Help you know when you are done

Fortunately, in developing parsers, there is a simple notation to use to describe the layout for a parser called Backus-Naur Form (BNF). You can find good examples of BNF at http://en.wikipedia.org/wiki/backus-naur_form. It is not vital that you be absolutely rigorous in your BNF notation; just get a clear idea ahead of time of what your grammar needs to include.

For the BNFs we write in this book, we'll just use this abbreviated notation:

::= means "is defined as"
+ means "1 or more"
* means "0 or more"
items enclosed in []are optional
succession of items means that matching tokens must occur in sequence
| means either item may occur

Use the Grammar to Parse the Input Text

In early versions of pyparsing, this step was limited to using the parseString method, as in:

assignmentTokens = assignmentExpr.parseString("pi=3.14159")

to retrieve the matching tokens as parsed from the input text.

The options for using your pyparsing grammar have increased since the early versions. With later releases of pyparsing, you can use any of the following:

parseString: Applies the grammar to the given input text.
scanString: Scans through the input text looking for matches; scanString is a generator function that returns the matched tokens, and the start and end location within the text, as each match is found.
searchString: A simple wrapper around scanString, returning a list containing each set of matched tokens within its own sublist.
transformString: Another wrapper around scanString, to simplify replacing matched tokens with modified text or replacement strings, or to strip text that matches the grammar.

For now, let's stick with parseString, and I'll show you the other choices in more detail later.

Process the Results from Parsing the Input Text

Of course, the whole point of using the parser in the first place is to extract data from the input text. Many parsing tools simply return a list of the matched tokens to be further processed to interpret the meaning of the separate tokens. Pyparsing offers a rich object for results, called ParseResults. In its simplest form, ParseResults can be printed and accessed just like a Python list. For instance, continuing our assignment expression example, the following code:

assignmentTokens = assignmentExpr.parseString("pi=3.14159")
print assignmentTokens

prints out:

['pi', '=', '3.14159']

But ParseResults can also support direct access to individual fields within the parsed text, if results names were assigned as part of the grammar definition. By enhancing the definition of assignmentExpr to use results names (such as lhs and rhs for the left- and righthand sides of the assignment), we can access the fields as if they were attributes of the returned ParseResults:

assignmentExpr = identifier.setResultsName("lhs") + "=" + 
                                           (identifier | number).setResultsName("rhs")
assignmentTokens = assignmentExpr.parseString( "pi=3.14159" )
print assignmentTokens.rhs, "is assigned to", assignmentTokens.lhs

prints out:

3.14159 is assigned to pi

Now that the introductions are out of the way, let's move on to some detailed examples.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Basic Form of a Pyparsing Program

Create new playlist

Sign In

Sign Up

Basic Form of a Pyparsing Program

Import Names from Pyparsing

Define the Grammar

Use the Grammar to Parse the Input Text

Process the Results from Parsing the Input Text

Table of Contents for
Basic Form of a Pyparsing Program