Library for programming Abstract Syntax Trees in Python - python

I'm creating a tree to represent a simple language. I'm very familiar with Abstract Syntax Trees, and have worked on frameworks for building and using them in C++. Is there a standard python library for specifying or manipulating arbitrary ASTs? Failing that, is there a tree library which is useful for the same purpose?
Note, I am not manipulating Python ASTs, so I think the AST module isn't suitable.

ASTs are very simple to implement in Python. For example, for my pycparser project (a complete C parser in Python) I've implemented ASTs based on ideas borrowed from Python's modules. The various AST nodes are specified in a YAML configuration file, and I generate Python code for these nodes in Python itself.

pyast is a package for building declarative abstract syntax trees.

If you represent your grammar elements as expressions in pyparsing, you can attach a parse action to each expression which returns a class instance containing the parsed tokens in a parser-specific type. There are a couple of examples on the pyparsing wiki that illustrate this technique (invRegex.py, simpleBool.py and evalArith.py). (These grammars all use the operatorPrecedence built-in, which can obscure some of the grammar structure, but

This blog entry, though short on implementation detail, describes a nice interface that Python ASTs could implement.
http://chris-lamb.co.uk/2006/12/08/visitor-pattern-in-python/

Related

Python xml marshalling

I'm learning Python, my background is Java EE. I have used JAXB before, where I can basically define a regular class, throw some annotations in there and then use JAXB to marshall objects to xml. This means I am not concerned with creating root elements, nodes, etc. but merely writing the Java class and anotating it here and there. Is there anything like this for Python?
Here are a few:
lxml.objectify
gnosis.xml.objecity
pyxser seems pretty cool
Pickle to XML - uses Python's pickle and xml.dom.minidom
pyxml -from xml import marshal (might be buggy)
Amara might be worth looking into.
PyXB seems to be the closest thing to JAXB although I haven't used it yet. I use lxml at the moment and find it works well.
Amara was promising but seemed to stagnate.

What parser generator does CPython use?

I was reading this page in the documentation, and noticed that it says
This is the full Python grammar, as it is read by the parser generator
and used to parse Python source files
However, I'm having difficulty finding out what parser generator CPython uses. So what parser generator does CPython use? Are there other parser generators that would take the grammar on that page without any modifications?
Python is open-source, so you can inspect the source code...
In the Python source directory is a "Parser" directory containing "Python.asdl" with the note
-- ASDL's four builtin types are identifier, int, string, object
There's also an "asdl.py" file in the same directory...
"""An implementation of the Zephyr Abstract Syntax Definition Language.
See http://asdl.sourceforge.net/ and
http://www.cs.princeton.edu/research/techreps/TR-554-97
Only supports top level module decl, not view. I'm guessing that view
is intended to support the browser and I'm not interested in the
browser.
Changes for Python: Add support for module versions
"""
So it appears that it is a custom parser generator. LALR(1) parser generators are not so hard to write.

ISO human-readable parser for Python in Python

I'm looking for a parser for Python (preferably v. 2.7) written in human-readable Python. Performance or flexibility are not important. The accuracy/correctness of the parsing and the clarity of the parser's code are far more important considerations here.
Searching online I've found a few parser generators that generate human-readable Python code, but I have not found the corresponding Python grammar to go with any of them (from what I could see, they all follow different grammar specification conventions). At any rate, even if I could find a suitable parser-generator/Python grammar combo, a readily available Python parser that fits my requirements (human-readable Python code) is naturally far more preferable.
Any suggestions?
Thanks!
PyPy is a Python implementation written entirely in Python. I am not an expert, but here's the link to their parser which - obviously - has been written in Python itself:
https://bitbucket.org/pypy/pypy/src/819faa2129a8/pypy/interpreter/pyparser
I think you should invest your effort in ast. An excerpt from the python docs.
The ast module helps Python applications to process trees of the
Python abstract syntax grammar. The abstract syntax itself might
change with each Python release; this module helps to find out
programmatically what the current grammar looks like.

Python parser Module tutorial

I am writing an application which reads an input file that currently has its own grammar, which is processed by lex/yacc.
I'm looking to modify this so as to make this input file a Python script instead, and was wondering if someone can point me to a beginner's guide to using the parser module in Python. I'm fairly new to Python itself, but have worked through a fair chunk of the online tutorial.
From what I have researched, I know there are options (such as pyparsing) which can allow me to keep the existing grammar and use Pyparsing as a replacement for lex/yacc. However, I am curious to learn the Python parser module in more detail and explore its feasibility.
Thanks.
You mean the parser module? It's a parser for Python source code only, not a general purpose parser. You can't use it to parse anything else.
As Jochen said, the parser module is for parsing Python code. I think you're best off checking out Ned Batchelder's list of parsers. PyParsing does things pretty differently from Lex and Yacc, so I'm not sure why you think you could keep your existing grammar and lexer. A better bet might be David Beazley's PLY toolkit. It's solid and has excellent documentation.
I recommend that you check out https://github.com/erezsh/lark
It's great for newcomers to parsing: It can parse ALL context-free grammars, it automatically builds an AST (with line & column numbers), and it accepts the grammar in EBNF format, which is considered the standard and is very easy to write.

Structured python docstrings, IDE-friendly

In PHP I was used to PHPdoc syntax:
/** Do something useful
#param first Primary data
#return int
#throws BadException
*/
function($first){ ...
— kinda short useful reference: very handy when all you need is just to recall 'what's that??', especially for 3rd-party libraries. Also, all IDEs can display this in popup hints.
It seems like there's no conventions in Python: just plain text. It describes things well, but it's too long to be a digest.
Ok, let it be. But in my applications I don't want to use piles of plaintext.
Are there any well-known conventions to follow? And how to document class attributes?! PyCharm IDE recipes are especially welcome :)
In Python3 there's a PEP 3107 for functional annotations. That's not useful for 2.x (2.6, specifically)
Also there's a PEP 0287 for reStructuredText: fancy but still not structured.
I use epydoc. It supports comments in reStructured Text, and it generates HTML documentation from those comments (akin to javadoc).
The numpydoc standard is well-defined, based around reStructuredText (which is standard within the python ecosystem), and has Sphinx integration. It should be relatively straight forward to write a plugin for PyCharm which can digest numpydoc.
Sphinx also has references on how to document attributes: http://sphinx.pocoo.org/ext/autodoc.html?highlight=autoattribute

Categories

Resources