I'm looking for a parser for Python (preferably v. 2.7) written in human-readable Python. Performance or flexibility are not important. The accuracy/correctness of the parsing and the clarity of the parser's code are far more important considerations here.
Searching online I've found a few parser generators that generate human-readable Python code, but I have not found the corresponding Python grammar to go with any of them (from what I could see, they all follow different grammar specification conventions). At any rate, even if I could find a suitable parser-generator/Python grammar combo, a readily available Python parser that fits my requirements (human-readable Python code) is naturally far more preferable.
Any suggestions?
Thanks!
PyPy is a Python implementation written entirely in Python. I am not an expert, but here's the link to their parser which - obviously - has been written in Python itself:
https://bitbucket.org/pypy/pypy/src/819faa2129a8/pypy/interpreter/pyparser
I think you should invest your effort in ast. An excerpt from the python docs.
The ast module helps Python applications to process trees of the
Python abstract syntax grammar. The abstract syntax itself might
change with each Python release; this module helps to find out
programmatically what the current grammar looks like.
Related
My task is to add switch statement and remove mandatory colons from functions, classes, loops in Python.
Maybe to add some other nice features from Coffeescript.
The .py files with custom syntax must be imported with python interpreter, than parsed with a custom parser (just like Coffeescript compiler does).
(I already had a little experience in writing Python-like "for" syntax to already created custom parser, corrected several bugs. But it takes a long time to read all code and get it. So I decided to ask advice first.)
I searched a long time through internet, found several helpful answers, but still don't know how to implement it better.
Some from what I found:
Parse a .py file, read the AST, modify it, then write back the modified source code
Python's tokenize module
Python's ast module
Python's c-like preprocessor with import hook
What I think to do:
Rewrite Coffeescript parser or Python parser into pure Python
Make import hook to parse files to AST by my own parser.
Continue import (compile AST and import it to module)
(like Coffeescript does it)
So I have such questions:
- Is there a Python parser written in Python (not to rewrite all Coffeescript parser) ?
- Maybe is there any way to make ast.AST class frow own parser not rewriting ast library from C into Python ?
- How can I do it better and easier ? (except modifying Python's sources, all must be done in runtime and be totally compatible with all other Python interpreters)
- Maybe there are already some libraries, that help modifying Python's syntax ?
Thank you very much.
Best regards, Serj.
Long story short, a piece of code that I'm working with at work has the line:
from System import System
with a later bit of code of:
desc_ = System()
xmlParser = Parser(desc_.getDocument())
# xmlParser.setEntityBase(self.dtdBase)
for featureXMLfile in featureXmlList.split(","):
print featureXMLfile
xmlParser.parse(featureXMLfile)
feat = desc_.get(featureName)
return feat
Parser is an XML parser in Java (it's included in a different import), but I don't get what the desc_ bit is doing. I mean obviously, it somehow holds the feature that we're trying to pull out, but I don't entirely see where. Is System a standard library in Python or Java, or am I looking at something custom?
Unfortunately, everyone else in my group is out for Christmas Eve vacation, so I can't ask them directly. Thank you for your help. I'm still not horribly familiar with Python.
This isn't from the standard library, so you'll need to check your system (Python has plenty of introspection to help you with that).
You can tell as Python modules in the standard library use lowercase names as per PEP-8, or by searching the library reference.
Note as well that Python has it's own XML parsing tools that will be much nicer to work with in Python than Java's.
Edit: As you have noted in the comments you are using Jython, it seems likely this is Java's System package.
millimoose indicated the correct answer in his comment, but neglected to submit it as an answer, so I'm posting to indicate the correct answer. It was indeed a custom module built by my company. I was able to determine this by typing import System; print(System) into the interpreter.
I am writing an application which reads an input file that currently has its own grammar, which is processed by lex/yacc.
I'm looking to modify this so as to make this input file a Python script instead, and was wondering if someone can point me to a beginner's guide to using the parser module in Python. I'm fairly new to Python itself, but have worked through a fair chunk of the online tutorial.
From what I have researched, I know there are options (such as pyparsing) which can allow me to keep the existing grammar and use Pyparsing as a replacement for lex/yacc. However, I am curious to learn the Python parser module in more detail and explore its feasibility.
Thanks.
You mean the parser module? It's a parser for Python source code only, not a general purpose parser. You can't use it to parse anything else.
As Jochen said, the parser module is for parsing Python code. I think you're best off checking out Ned Batchelder's list of parsers. PyParsing does things pretty differently from Lex and Yacc, so I'm not sure why you think you could keep your existing grammar and lexer. A better bet might be David Beazley's PLY toolkit. It's solid and has excellent documentation.
I recommend that you check out https://github.com/erezsh/lark
It's great for newcomers to parsing: It can parse ALL context-free grammars, it automatically builds an AST (with line & column numbers), and it accepts the grammar in EBNF format, which is considered the standard and is very easy to write.
I'm working on a domain-specific language implemented on top of Python. The grammar is so close to Python's that until now we've just been making a few trivial string transformations and then feeding it into ast. For example, indentation is replaced by #endfor/#endwhile/#endif statements, so we normalize the indentation while it's still a string.
I'm wondering if there's a better way? As far as I can tell, ast is hardcoded to parse the Python grammar and I can't really find any documentation other than http://docs.python.org/library/ast.html#module-ast (and the source itself, I suppose).
Does anyone have personal experience with PyParsing, ANTLR, or PLY?
There are vague plans to rewrite the interpreter into something that transforms our language into valid Python and feeds that into the Python interpreter itself, so I'd like something compatible with compile, but this isn't a deal breaker.
Update: It just occurred to me that
from __future__ import print_function, with_statement
changes the way Python parses the following source. However, PEP 236 suggests that this is syntactic window dressing for a compiler feature. Could someone confirm that trying to override/extend __future__ is not the correct solution to my problem?
PLY works. It's odd because it mimics lex/yacc in a way that's not terribly pythonic.
Both lex and yacc have an implicit interface that makes it possible to run the output from lex as a stand-alone program. This "feature" is carefully preserved. Similarly for the yacc-like features of PLY. The "feature" to create a weird, implicit stand-alone main program is carefully preserved.
However, PLY as lex/yacc-compatible toolset is quite nice. All your lex/yacc skills are preserved.
[Editorial Comment. "Fixing" Python's grammar will probably be a waste of time. Almost everyone can indent correctly without any help. Check C, Java, C++ and even Pascal code, and you'll see that almost everyone can indent really well. Indeed, people go to great lengths to indent Java where it's not needed. If indentation is unimportant in Java, why do people do such a good job of it?]
I'm creating a tree to represent a simple language. I'm very familiar with Abstract Syntax Trees, and have worked on frameworks for building and using them in C++. Is there a standard python library for specifying or manipulating arbitrary ASTs? Failing that, is there a tree library which is useful for the same purpose?
Note, I am not manipulating Python ASTs, so I think the AST module isn't suitable.
ASTs are very simple to implement in Python. For example, for my pycparser project (a complete C parser in Python) I've implemented ASTs based on ideas borrowed from Python's modules. The various AST nodes are specified in a YAML configuration file, and I generate Python code for these nodes in Python itself.
pyast is a package for building declarative abstract syntax trees.
If you represent your grammar elements as expressions in pyparsing, you can attach a parse action to each expression which returns a class instance containing the parsed tokens in a parser-specific type. There are a couple of examples on the pyparsing wiki that illustrate this technique (invRegex.py, simpleBool.py and evalArith.py). (These grammars all use the operatorPrecedence built-in, which can obscure some of the grammar structure, but
This blog entry, though short on implementation detail, describes a nice interface that Python ASTs could implement.
http://chris-lamb.co.uk/2006/12/08/visitor-pattern-in-python/