parser
The parser
module provides an interface to Python's internal
parser and byte-code compiler. The primary purpose for this interface
is to allow Python code to edit the parse tree of a Python expression
and create executable code from this. This is better than trying
to parse and modify an arbitrary Python code fragment as a string
because parsing is performed in a manner identical to the code
forming the application. It is also faster.
There are a few things to note about this module which are important
to making use of the data structures created. This is not a tutorial
on editing the parse trees for Python code, but some examples of using
the parser
module are presented.
Most importantly, a good understanding of the Python grammar processed
by the internal parser is required. For full information on the
language syntax, refer to the Language Reference. The parser itself
is created from a grammar specification defined in the file
`Grammar/Grammar
' in the standard Python distribution. The parse
trees stored in the ``AST objects'' created by this module are the
actual output from the internal parser when created by the
expr()
or suite()
functions, described below. The AST
objects created by sequence2ast()
faithfully simulate those
structures. Be aware that the values of the sequences which are
considered ``correct'' will vary from one version of Python to another
as the formal grammar for the language is revised. However,
transporting code from one Python version to another as source text
will always allow correct parse trees to be created in the target
version, with the only restriction being that migrating to an older
version of the interpreter will not support more recent language
constructs. The parse trees are not typically compatible from one
version to another, whereas source code has always been
forward-compatible.
Each element of the sequences returned by ast2list
or
ast2tuple()
has a simple form. Sequences representing
non-terminal elements in the grammar always have a length greater than
one. The first element is an integer which identifies a production in
the grammar. These integers are given symbolic names in the C header
file `Include/graminit.h
' and the Python module
symbol
. Each additional element of the sequence represents
a component of the production as recognized in the input string: these
are always sequences which have the same form as the parent. An
important aspect of this structure which should be noted is that
keywords used to identify the parent node type, such as the keyword
if
in an if_stmt
, are included in the node tree without
any special treatment. For example, the if
keyword is
represented by the tuple (1, 'if')
, where 1
is the
numeric value associated with all NAME
tokens, including
variable and function names defined by the user. In an alternate form
returned when line number information is requested, the same token
might be represented as (1, 'if', 12)
, where the 12
represents the line number at which the terminal symbol was found.
Terminal elements are represented in much the same way, but without
any child elements and the addition of the source text which was
identified. The example of the if
keyword above is
representative. The various types of terminal symbols are defined in
the C header file `Include/token.h
' and the Python module
token
.
The AST objects are not required to support the functionality of this
module, but are provided for three purposes: to allow an application
to amortize the cost of processing complex parse trees, to provide a
parse tree representation which conserves memory space when compared
to the Python list or tuple representation, and to ease the creation
of additional modules in C which manipulate parse trees. A simple
``wrapper'' class may be created in Python to hide the use of AST
objects; the AST
library module provides a variety of such
classes.
The parser
module defines functions for a few distinct
purposes. The most important purposes are to create AST objects and
to convert AST objects to other representations such as parse trees
and compiled code objects, but there are also functions which serve to
query the type of parse tree represented by an AST object.
guido@CNRI.Reston.Va.US