9
THE ABSTRACT SYNTAX TREE, HY, AND LISP-LIKE ATTRIBUTES

image

The abstract syntax tree (AST) is a representation of the structure of the source code of any programming language. Every language, including Python, has a specific AST; Python’s AST is built by parsing a Python source file. Like any tree, this one is made of nodes linked together. A node can represent an operation, a statement, an expression, or even a module. Each node can contain references to other nodes that make up the tree.

Python’s AST is not heavily documented and is thus hard to deal with at first glance, but understanding some deeper aspects of how Python is constructed can help you master its usage.

This chapter will examine the AST of some simple Python commands to get you familiar with the structure and how it’s used. Once you’re familiar with the AST, we’ll build a program that can check for wrongly declared methods using flake8 and the AST. Finally, we’ll look at Hy, a Python-Lisp hybrid language built on the Python AST.

Looking at the AST

The easiest way to view the Python AST is to parse some Python code and dump the generated AST. For that, the Python ast module provides everything you need, as shown in Listing 9-1.

>>> import ast
>>> ast.parse
<function parse at 0x7f062731d950>
>>> ast.parse("x = 42")
<_ast.Module object at 0x7f0628a5ad10>
>>> ast.dump(ast.parse("x = 42"))
"Module(body=[Assign(targets=[Name(id='x', ctx=Store())], value=Num(n=42))])"

Listing 9-1: Using the ast module to dump the AST generated by parsing code

The ast.parse() function parses any string that contains Python code and returns an _ast.Module object. That object is actually the root of the tree: you can browse it to discover every node making up the tree. To visualize what the tree looks like, you can use the ast.dump() function, which will return a string representation of the whole tree.

In Listing 9-1, the code x = 42 is parsed with ast.parse(), and the result is printed using ast.dump(). This abstract syntax tree can be rendered as shown in Figure 9-1, which shows the structure of the Python assign command.

image

Figure 9-1: The AST of the assign command in Python

The AST always starts with a root element, which is usually an _ast.Module object. This module object contains a list of statements or expressions to evaluate in its body attribute and usually represents the content of a file.

As you can probably guess, the ast.Assign object shown in Figure 9-1 represents an assignment, which is mapped to the = sign in the Python syntax. An ast.Assign object has a list of targets and a value to set the targets to. The list of targets in this case consists of one object, ast.Name, which represents a variable whose ID is x. The value is a number n with a value (in this case) 42. The ctx attribute stores a context, either ast.Store or ast.Load, depending on whether the variable is being used for reading or writing. In this case, the variable is being assigned a value, so an ast.Store context is used.

We could pass this AST to Python to be compiled and evaluated via the built-in compile() function. This function takes an AST as argument, the source filename, and a mode (either 'exec', 'eval', or 'single'). The source filename can be any name that you want your AST to appear to be from; it is common to use the string <input> as the source filename if the data does not come from a stored file, as shown in Listing 9-2.

>>> compile(ast.parse("x = 42"), '<input>', 'exec')
<code object <module> at 0x111b3b0, file "<input>", line 1>
>>> eval(compile(ast.parse("x = 42"), '<input>', 'exec'))
>>> x
42

Listing 9-2: Using the compile() function to compile data that is not from a stored file

The modes stand for execute (exec), evaluate (eval), and single statement (single). The mode should match what has been given to ast.parse(), whose default is exec.

  • The exec mode is the normal Python mode, used when an _ast.Module is the root of the tree.

  • The eval mode is a special mode that expects a single ast.Expression as the tree.

  • Finally, single is another special mode that expects a single statement or expression. If it gets an expression, sys.displayhook() will be called with the result, as when code is run in the interactive shell.

The root of the AST is ast.Interactive, and its body attribute is a list of nodes.

We could build an AST manually using the classes provided in the ast module. Obviously, this is a very long way to write Python code and not a method I would recommend! Nonetheless, it’s fun to do and helpful for learning about the AST. Let’s see what programming with the AST would look like.

Writing a Program Using the AST

Let’s write a good old "Hello world!" program in Python by building an abstract syntax tree manually.

>>> hello_world = ast.Str(s='hello world!', lineno=1, col_offset=1)
>>> print_name = ast.Name(id='print', ctx=ast.Load(), lineno=1, col_offset=1)
>>> print_call = ast.Call(func=print_name, ctx=ast.Load(),
   ... args=[hello_world], keywords=[], lineno=1, col_offset=1)
>>> module = ast.Module(body=[ast.Expr(print_call,
   ... lineno=1, col_offset=1)], lineno=1, col_offset=1)
>>> code = compile(module, '', 'exec')
   >>> eval(code)
   hello world!

Listing 9-3: Writing hello world! using the AST

In Listing 9-3, we build the tree one leaf at a time, where each leaf is an element (whether a value or an instruction) of the program.

The first leaf is a simple string : the ast.Str represents a literal string, which here contains the hello world! text. The print_name variable contains an ast.Name object, which refers to a variable—in this case, the print variable that points to the print() function.

The print_call variable contains a function call. It refers to the function name to call, the regular arguments to pass to the function call, and the keyword arguments. Which arguments are used depend on the functions being called. In this case, since it’s the print() function, we’ll pass the string we made and stored in hello_world.

At last, we create an _ast.Module object to contain all this code as a list of one expression. We can compile _ast.Module objects using the compile() function , which parses the tree and generates a native code object. These code objects are compiled Python code and can finally be executed by a Python virtual machine using eval!

This whole process is exactly what happens when you run Python on a .py file: once the text tokens are parsed, they are converted into a tree of ast objects, compiled, and evaluated.

NOTE

The arguments lineno and col_offset represent the line number and column offset, respectively, of the source code that has been used to generate the AST. It doesn’t make much sense to set these values in this context since we are not parsing a source file, but it can be useful to be able to find the position of the code that generated the AST. For example, Python uses this information when generating backtraces. Indeed, Python refuses to compile an AST object that doesn’t provide this information, so we pass fake values to these. You could also use the ast.fix_missing_locations() function to set the missing values to the ones set on the parent node.

The AST Objects

You can view the whole list of objects available in the AST by reading the _ast module documentation (note the underscore).

The objects are organized into two main categories: statements and expressions. Statements include types such as assert, assignment (=), augmented assignment (+=, /=, etc.), global, def, if, return, for, class, pass, import, raise, and so forth. Statements inherit from ast.stmt; they influence the control flow of a program and are often composed of expressions.

Expressions include types such as lambda, number, yield, name (variable), compare, and call. Expressions inherit from ast.expr; they differ from statements in that they usually produce a value and have no impact on the program flow.

There are also a few smaller categories, such as the ast.operator class, which defines standard operators such as add (+), div (/), and right shift (>>), and the ast.cmpop module, which defines comparisons operators.

The simple example here should give you an idea of how to build an AST from scratch. It’s easy to then imagine how you might leverage this AST to construct a compiler that would parse strings and generate code, allowing you to implement your own syntax to Python! This is exactly what led to the development of the Hy project, which we’ll discuss later in this chapter.

Walking Through an AST

To follow how a tree is built or access particular nodes, you sometimes need to walk through your tree, browsing it and iterating over the nodes. You can do this with the ast.walk() function. Alternatively, the ast module also provides NodeTransformer, a class that you can subclass to walk through an AST and modify particular nodes. Using NodeTransformer makes it easy to change code dynamically, as shown in Listing 9-4.

   import ast

   class ReplaceBinOp(ast.NodeTransformer):
       """Replace operation by addition in binary operation"""
       def visit_BinOp(self, node):
           return ast.BinOp(left=node.left,
                            op=ast.Add(),
                            right=node.right)

tree = ast.parse("x = 1/3")
   ast.fix_missing_locations(tree)
   eval(compile(tree, '', 'exec'))
   print(ast.dump(tree))
print(x)

tree = ReplaceBinOp().visit(tree)
   ast.fix_missing_locations(tree)
   print(ast.dump(tree))
   eval(compile(tree, '', 'exec'))
print(x)

Listing 9-4: Walking a tree with NodeTransformer to alter a node

The first tree object built is an AST that represents the expression x = 1/3. Once this is compiled and evaluated, the result of printing x at the end of the function is 0.33333, the expected result of 1/3.

The second tree object is an instance of ReplaceBinOp, which inherits from ast.NodeTransformer. It implements its own version of the ast.NodeTransformer.visit() method and changes any ast.BinOp operation to an ast.BinOp that executes ast.Add. Concretely, this changes any binary operator (+, -, /, and so on) to the + operator. When this second tree is compiled and evaluated , the result is now 4, which is the result of 1 + 3, because the / in the first object is replaced with +.

You can see the execution of the program here:

Module(body=[Assign(targets=[Name(id='x', ctx=Store())],
                    value=BinOp(left=Num(n=1), op=Div(), right=Num(n=3)))])
0.3333333333333333
Module(body=[Assign(targets=[Name(id='x', ctx=Store())],
                    value=BinOp(left=Num(n=1), op=Add(), right=Num(n=3)))])
4

NOTE

If you need to evaluate a string that should return a simple data type, you can use ast.literal_eval. As a safer alternative to eval, it prevents the input string from executing any code.

Extending flake8 with AST Checks

In Chapter 7, you learned that methods that do not rely on the object state should be declared static with the @staticmethod decorator. The problem is that a lot of developers simply forget to do so. I’ve personally spent too much time reviewing code and asking people to fix this problem.

We’ve seen how to use flake8 to do some automatic checking in the code. In fact, flake8 is extensible and can provide even more checks. We’ll write a flake8 extension that checks for static method declaration omission by analyzing the AST.

Listing 9-5 shows an example of one class that omits the static declaration and one that correctly includes it. Write this program out and save it as ast_ext.py; we’ll use it in a moment to write our extension.

class Bad(object):
    # self is not used, the method does not need
    # to be bound, it should be declared static
    def foo(self, a, b, c):
        return a + b - c

class OK(object):
    # This is correct
    @staticmethod
    def foo(a, b, c):
        return a + b - c

Listing 9-5: Omitting and including @staticmethod

Though the Bad.foo method works fine, strictly speaking it is more correct to write it as OK.foo (turn back to Chapter 7 for more detail on why). To check whether all the methods in a Python file are correctly declared, we need to do the following:

  • Iterate over all the statement nodes of the AST.

  • Check that the statement is a class definition (ast.ClassDef).

  • Iterate over all the function definitions (ast.FunctionDef) of that class statement to check whether it is already declared with @staticmethod.

  • If the method is not declared static, check whether the first argument (self) is used somewhere in the method. If self is not used, the method can be tagged as potentially miswritten.

The name of our project will be ast_ext. To register a new plugin in flake8, we need to create a packaged project with the usual setup.py and setup.cfg files. Then, we just need to add an entry point in the setup.cfg of our ast_ext project.

[entry_points]
flake8.extension =
    --snip--
    H904 = ast_ext:StaticmethodChecker
    H905 = ast_ext:StaticmethodChecker

Listing 9-6: Allowing flake8 plugins for our chapter

In Listing 9-6, we also register two flake8 error codes. As you’ll notice later, we are actually going to add an extra check to our code while we’re at it!

The next step is to write the plugin.

Writing the Class

Since we are writing a flake8 check of the AST, the plugin needs to be a class following a certain signature, as shown in Listing 9-7.

class StaticmethodChecker(object):
    def __init__(self, tree, filename):
        self.tree = tree

    def run(self):
        pass

Listing 9-7: The class for checking the AST

The default template is easy to understand: it stores the tree locally for use in the run() method, which will yield the problems that are discovered. The value that will be yielded must follow the expected PEP 8 signature: a tuple of the form (lineno, col_offset, error_string, code).

Ignoring Irrelevant Code

As indicated earlier, the ast module provides the walk() function, which allows you to iterate easily on a tree. We’ll use that to walk through the AST and find out what to check and what not to check.

First, let’s write a loop that ignores the statements that are not class definitions. Add this to your ast_ext project, as shown in Listing 9-8; code that should stay the same is grayed out.

class StaticmethodChecker(object):
    def __init__(self, tree, filename):
        self.tree = tree

    def run(self):
        for stmt in ast.walk(self.tree):
            # Ignore non-class
            if not isinstance(stmt, ast.ClassDef):
                continue

Listing 9-8: Ignoring statements that are not class definitions

The code in Listing 9-8 is still not checking for anything, but now it knows how to ignore statements that are not class definitions. The next step is to set our checker to ignore anything that is not a function definition.

for stmt in ast.walk(self.tree):
    # Ignore non-class
    if not isinstance(stmt, ast.ClassDef):
        continue
    # If it's a class, iterate over its body member to find methods
    for body_item in stmt.body:
        # Not a method, skip
        if not isinstance(body_item, ast.FunctionDef):
            continue

Listing 9-9: Ignoring statements that are not function definitions

In Listing 9-9, we ignore irrelevant statements by iterating over the attributes of the class definition.

Checking for the Correct Decorator

We’re all set to write the checking method, which is stored in the body_item attribute. First, we need to check whether the method that’s being checked is already declared as static. If it is, we don’t have to do any further checking and can bail out.

for stmt in ast.walk(self.tree):
    # Ignore non-class
    if not isinstance(stmt, ast.ClassDef):
        continue
    # If it's a class, iterate over its body member to find methods
    for body_item in stmt.body:
        # Not a method, skip
        if not isinstance(body_item, ast.FunctionDef):
            continue
        # Check that it has a decorator
        for decorator in body_item.decorator_list:
            if (isinstance(decorator, ast.Name)
               and decorator.id == 'staticmethod'):
                # It's a static function, it's OK
                break
        else:
            # Function is not static, we do nothing for now
            Pass

Listing 9-10: Checking for the static decorator

Note that in Listing 9-10, we use the special for/else form of Python, where the else is evaluated unless we use break to exit the for loop. At this point, we’re able to detect whether a method is declared static.

Looking for self

The next step is to check whether the method that isn’t declared as static uses the self argument. First, check whether the method includes any arguments at all, as shown in Listing 9-11.

--snip--
        # Check that it has a decorator
        for decorator in body_item.decorator_list:
            if (isinstance(decorator, ast.Name)
               and decorator.id == 'staticmethod'):
                # It's a static function, it's OK
                break
        else:
            try:
                first_arg = body_item.args.args[0]
            except IndexError:
                yield (
                    body_item.lineno,
                    body_item.col_offset,
                    "H905: method misses first argument",
                    "H905",
                )
                # Check next method
                Continue

Listing 9-11: Checking the method for arguments

We finally added a check! This try statement in Listing 9-11 grabs the first argument from the method signature. If the code fails to retrieve the first argument from the signature because a first argument doesn’t exist, we already know there’s a problem: you can’t have a bound method without the self argument. If the plugin detects that case, it raises the H905 error code we set earlier, signaling a method that misses its first argument.

NOTE

PEP 8 codes follow a particular format for error codes (a letter followed by a number), but there are no rules as to which code to pick. You could come up with any other code for this error, as long as it’s not already used by PEP 8 or another extension.

Now you know why we registered two error codes in setup.cfg: we had a good opportunity to kill two birds with one stone.

The next step is to check whether the self argument is used in the code of the method.

--snip--
            try:
                first_arg = body_item.args.args[0]
            except IndexError:
                yield (
                    body_item.lineno,
                    body_item.col_offset,
                    "H905: method misses first argument",
                    "H905",
                )
                # Check next method
                continue
            for func_stmt in ast.walk(body_item):
                # The checking method must differ between Python 2 and Python 3
                if six.PY3:
                    if (isinstance(func_stmt, ast.Name)
                       and first_arg.arg == func_stmt.id):
                        # The first argument is used, it's OK
                        break
                else:
                    if (func_stmt != first_arg
                       and isinstance(func_stmt, ast.Name)
                       and func_stmt.id == first_arg.id):
                        # The first argument is used, it's OK
                        break
            else:
                yield (
                    body_item.lineno,
                    body_item.col_offset,
                    "H904: method should be declared static",
                    "H904",
                )

Listing 9-12: Checking the method for the self argument

To check whether the self argument is used in the method’s body, the plugin in Listing 9-12 iterates recursively, using ast.walk on the body and looking for the use of the variable named self. If the variable isn’t found, the program finally yields the H904 error code. Otherwise, nothing happens, and the code is considered sane.

NOTE

As you may have noticed, the code walks over the module AST definition several times. There might be some degree of optimization to browsing the AST in only one pass, but I’m not sure it’s worth it, given how the tool is actually used. I’ll leave that exercise to you, dear reader.

Knowing the Python AST is not strictly necessary for using Python, but it does give powerful insight into how the language is built and how it works. It thus gives you a better understanding of how the code you write is being used under the hood.

A Quick Introduction to Hy

Now that you have a good understanding of how Python AST works, you can start dreaming of creating a new syntax for Python. You could parse this new syntax, build an AST out of it, and compile it down to Python code.

This is exactly what Hy does. Hy is a Lisp dialect that parses a Lisp-like language and converts it to regular Python AST, making it fully compatible with the Python ecosystem. You could compare it to what Clojure is to Java. Hy could fill a book by itself, so we will only skim over it. Hy uses the syntax and some features of the Lisp family of languages: it’s functionally oriented, provides macros, and is easily extensible.

If you’re not already familiar with Lisp—and you should be—the Hy syntax will look familiar. Once you install Hy (by running pip install hy), launching the hy interpreter will give you a standard REPL prompt from which you can start to interact with the interpreter, as shown in Listing 9-13.

% hy
hy 0.9.10
=> (+ 1 2)
3

Listing 9-13: Interacting with the Hy interpreter

For those not familiar with the Lisp syntax, parentheses are used to construct lists. If a list is unquoted, it is evaluated: the first element must be a function, and the rest of the items from the list are passed as arguments. Here the code (+ 1 2) is equivalent to 1 + 2 in Python.

In Hy, most constructs, such as function definitions, are mapped from Python directly.

=> (defn hello [name]
...  (print "Hello world!")
...  (print (% "Nice to meet you %s" name)))
=> (hello "jd")
Hello world!
Nice to meet you jd

Listing 9-14: Mapping a function definition from Python

As shown in Listing 9-14, internally Hy parses the code provided, converts it to a Python AST, compiles it, and evaluates it. Fortunately, Lisp is an easy tree to parse: each pair of parentheses represents a node of the tree, meaning the conversion is actually easier than for the native Python syntax!

Class definition is supported through the defclass construct, which is inspired by the Common Lisp Object System (CLOS).

(defclass A [object]
  [[x 42]
   [y (fn [self value]
        (+ self.x value))]])

Listing 9-15: Defining a class with defclass

Listing 9-15 defines a class named A, which inherits from object, with a class attribute x whose value is 42; then a method y returns the x attribute plus a value passed as argument.

What’s really wonderful is that you can import any Python library directly into Hy and use it with no penalty. Use the import() function to import a module, as shown in Listing 9-16, just as you would with regular Python.

=> (import uuid)
=> (uuid.uuid4)
UUID('f823a749-a65a-4a62-b853-2687c69d0e1e')
=> (str (uuid.uuid4))
'4efa60f2-23a4-4fc1-8134-00f5c271f809'

Listing 9-16: Importing regular Python modules

Hy also has more advanced constructs and macros. In Listing 9-17, admire what the cond() function can do for you instead of the classic but verbose if/elif/else.

(cond
 [(> somevar 50)
  (print "That variable is too big!")]
 [(< somevar 10)
  (print "That variable is too small!")]
 [true
  (print "That variable is jusssst right!")])

Listing 9-17: Using cond instead of if/elif/else

The cond macro has the following signature: (cond [condition_expression return_expression] ...). Each condition expression is evaluated, starting with the first: as soon as one of the condition expressions returns a true value, the return expression is evaluated and returned. If no return expression is provided, then the value of the condition expression is returned. Thus, cond is equivalent to an if/elif construct, except that it can return the value of the condition expression without having to evaluate it twice or store it in a temporary variable!

Hy allows you to jump into the Lisp world without leaving your comfort zone too far behind you, since you’re still writing Python. The hy2py tool can even show you what your Hy code would look like once translated into Python. While Hy is not widely used, it is a great tool to show the potential of the Python language. If you’re interested in learning more, I suggest you check out the online documentation and join the community.

Summary

Just like any other programming language, Python source code can be represented using an abstract tree. You’ll rarely use the AST directly, but when you understand how it works, it can provide a helpful perspective.

Paul Tagliamonte on the AST and Hy

Paul created Hy in 2013, and, as a Lisp lover, I joined him in this fabulous adventure. Paul is currently a developer at Sunlight Foundation.

How did you learn to use the AST correctly, and do you have any advice for people looking at it?

The AST is extremely underdocumented, so most knowledge comes from generated ASTs that have been reverse engineered. By writing up simple Python scripts, one can use something similar to import ast; ast.dump(ast.parse("print foo")) to generate an equivalent AST to help with the task. With a bit of guesswork, and some persistence, it’s not untenable to build up a basic understanding this way.

At some point, I’ll take on the task of documenting my understanding of the AST module, but I find writing code is the best way to learn the AST.

How does Python’s AST differ between versions and uses?

Python’s AST is not private, but it’s not a public interface either. No stability is guaranteed from version to version—in fact, there are some rather annoying differences between Python 2 and 3 and even within different Python 3 releases. In addition, different implementations may interpret the AST differently or even have a unique AST. Nothing says Jython, PyPy, or CPython must deal with the Python AST in the same way.

For instance, CPython can handle slightly out-of-order AST entries (by the lineno and col_offset), whereas PyPy will throw an assertion error. Though sometimes annoying, the AST is generally sane. It’s not impossible to build an AST that works on a vast number of Python instances. With a conditional or two, it’s only mildly annoying to create an AST that works on CPython 2.6 through 3.3 and PyPy, making this tool quite handy.

What was your process in creating Hy?

I started on Hy following a conversation about how useful it would be to have a Lisp that compiles to Python rather than Java’s JVM (Clojure). A few short days later, and I had the first version of Hy. This version resembled a Lisp and even worked like a proper Lisp in some ways, but it was slow. I mean, really slow. It was about an order of magnitude slower than native Python, since the Lisp runtime itself was implemented in Python.

Frustrated, I almost gave up, but then a coworker suggested using the AST to implement the runtime, rather than implementing the runtime in Python. This suggestion was the catalyst for the entire project. I spent my entire holiday break in 2012 hacking on Hy. A week or so later, I had something that resembled the current Hy codebase.

Just after getting enough of Hy working to implement a basic Flask app, I gave a talk at Boston Python about the project, and the reception was incredibly warm—so warm, in fact, that I start to view Hy as a good way to teach people about Python internals, such as how the REPL works, PEP 302 import hooks, and the Python AST. This was a good introduction to the concept of code that writes code.

I rewrote chunks of the compiler to fix some philosophical issues in the process, leading us to the current iteration of the codebase—which has stood up quite well!

Learning Hy is also a good way to begin understanding how to read Lisp. Users can get comfortable with s-expressions in an environment they know and even use libraries they’re already using, easing the transition to other Lisps, such as Common Lisp, Scheme, or Clojure.

How interoperable with Python is Hy?

Hy is amazingly interoperable. So much so that pdb can properly debug Hy without you having to make any changes at all. I’ve written Flask apps, Django apps, and modules of all sorts with Hy. Python can import Python, Hy can import Hy, Hy can import Python, and Python can import Hy. This is what really makes Hy unique; other Lisp variants like Clojure are purely unidirectional. Clojure can import Java, but Java has one hell of a time importing Clojure.

Hy works by translating Hy code (in s-expressions) into the Python AST almost directly. This compilation step means the generated bytecode is fairly sane stuff, which means Python has a very hard time of even telling the module isn’t written in Python at all.

Common Lisp-isms, such as *earmuffs* or using-dashes are fully supported by translating them into a Python equivalent (in this case, *earmuffs* becomes EARMUFFS, and using-dashes becomes using_dashes), which means Python doesn’t have a hard time using them at all.

Ensuring that we have really good interoperability is one of our highest priorities, so if you see any bugs—file them!

What are the advantages and disadvantages of choosing Hy?

One advantage of Hy is that it has a full macro system, which Python struggles with. Macros are special functions that alter the code during the compile step. This makes it easy to create new domain-specific languages, which are composed of the base language (in this case, Hy/Python) along with many macros that allow uniquely expressive and succinct code.

As for downsides, Hy, by virtue of being a Lisp written in s-expressions, suffers from the stigma of being hard to learn, read, or maintain. People might be averse to working on projects using Hy for fear of its complexity.

Hy is the Lisp everyone loves to hate. Python folks may not enjoy its syntax, and Lispers may avoid it because Hy uses Python objects directly, meaning the behavior of fundamental objects can sometimes be surprising to the seasoned Lisper.

Hopefully people will look past its syntax and consider exploring parts of Python previously untouched.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.188.201