I am writing an application which reads an input file that currently has its own grammar, which is processed by lex/yacc.
I'm looking to modify this so as to make this input file a Python script instead, and was wondering if someone can point me to a beginner's guide to using the parser module in Python. I'm fairly new to Python itself, but have worked through a fair chunk of the online tutorial.
From what I have researched, I know there are options (such as pyparsing) which can allow me to keep the existing grammar and use Pyparsing as a replacement for lex/yacc. However, I am curious to learn the Python parser module in more detail and explore its feasibility.
Thanks.
You mean the parser module? It's a parser for Python source code only, not a general purpose parser. You can't use it to parse anything else.
As Jochen said, the parser module is for parsing Python code. I think you're best off checking out Ned Batchelder's list of parsers. PyParsing does things pretty differently from Lex and Yacc, so I'm not sure why you think you could keep your existing grammar and lexer. A better bet might be David Beazley's PLY toolkit. It's solid and has excellent documentation.
I recommend that you check out https://github.com/erezsh/lark
It's great for newcomers to parsing: It can parse ALL context-free grammars, it automatically builds an AST (with line & column numbers), and it accepts the grammar in EBNF format, which is considered the standard and is very easy to write.
Related
My task is to add switch statement and remove mandatory colons from functions, classes, loops in Python.
Maybe to add some other nice features from Coffeescript.
The .py files with custom syntax must be imported with python interpreter, than parsed with a custom parser (just like Coffeescript compiler does).
(I already had a little experience in writing Python-like "for" syntax to already created custom parser, corrected several bugs. But it takes a long time to read all code and get it. So I decided to ask advice first.)
I searched a long time through internet, found several helpful answers, but still don't know how to implement it better.
Some from what I found:
Parse a .py file, read the AST, modify it, then write back the modified source code
Python's tokenize module
Python's ast module
Python's c-like preprocessor with import hook
What I think to do:
Rewrite Coffeescript parser or Python parser into pure Python
Make import hook to parse files to AST by my own parser.
Continue import (compile AST and import it to module)
(like Coffeescript does it)
So I have such questions:
- Is there a Python parser written in Python (not to rewrite all Coffeescript parser) ?
- Maybe is there any way to make ast.AST class frow own parser not rewriting ast library from C into Python ?
- How can I do it better and easier ? (except modifying Python's sources, all must be done in runtime and be totally compatible with all other Python interpreters)
- Maybe there are already some libraries, that help modifying Python's syntax ?
Thank you very much.
Best regards, Serj.
I'm looking for a parser for Python (preferably v. 2.7) written in human-readable Python. Performance or flexibility are not important. The accuracy/correctness of the parsing and the clarity of the parser's code are far more important considerations here.
Searching online I've found a few parser generators that generate human-readable Python code, but I have not found the corresponding Python grammar to go with any of them (from what I could see, they all follow different grammar specification conventions). At any rate, even if I could find a suitable parser-generator/Python grammar combo, a readily available Python parser that fits my requirements (human-readable Python code) is naturally far more preferable.
Any suggestions?
Thanks!
PyPy is a Python implementation written entirely in Python. I am not an expert, but here's the link to their parser which - obviously - has been written in Python itself:
https://bitbucket.org/pypy/pypy/src/819faa2129a8/pypy/interpreter/pyparser
I think you should invest your effort in ast. An excerpt from the python docs.
The ast module helps Python applications to process trees of the
Python abstract syntax grammar. The abstract syntax itself might
change with each Python release; this module helps to find out
programmatically what the current grammar looks like.
I'm working on a domain-specific language implemented on top of Python. The grammar is so close to Python's that until now we've just been making a few trivial string transformations and then feeding it into ast. For example, indentation is replaced by #endfor/#endwhile/#endif statements, so we normalize the indentation while it's still a string.
I'm wondering if there's a better way? As far as I can tell, ast is hardcoded to parse the Python grammar and I can't really find any documentation other than http://docs.python.org/library/ast.html#module-ast (and the source itself, I suppose).
Does anyone have personal experience with PyParsing, ANTLR, or PLY?
There are vague plans to rewrite the interpreter into something that transforms our language into valid Python and feeds that into the Python interpreter itself, so I'd like something compatible with compile, but this isn't a deal breaker.
Update: It just occurred to me that
from __future__ import print_function, with_statement
changes the way Python parses the following source. However, PEP 236 suggests that this is syntactic window dressing for a compiler feature. Could someone confirm that trying to override/extend __future__ is not the correct solution to my problem?
PLY works. It's odd because it mimics lex/yacc in a way that's not terribly pythonic.
Both lex and yacc have an implicit interface that makes it possible to run the output from lex as a stand-alone program. This "feature" is carefully preserved. Similarly for the yacc-like features of PLY. The "feature" to create a weird, implicit stand-alone main program is carefully preserved.
However, PLY as lex/yacc-compatible toolset is quite nice. All your lex/yacc skills are preserved.
[Editorial Comment. "Fixing" Python's grammar will probably be a waste of time. Almost everyone can indent correctly without any help. Check C, Java, C++ and even Pascal code, and you'll see that almost everyone can indent really well. Indeed, people go to great lengths to indent Java where it's not needed. If indentation is unimportant in Java, why do people do such a good job of it?]
Hey as a project to improve my programing skills I've begun programing a nice code editor in python to teach myself project management, version control, and gui programming. I was wanting to utilize syntax files made for other programs so I could have a large collection already. I was wondering if there was any kind of universal syntax file format much in the same sense as .odt files. I heard of one once in a forum, it had a website, but I can't remember it now. If not I may just try to use gedit syntax files or geany.
thanks
If you're planning to do syntax highlighting, check out Pygments, especially the bit about lexers.
Since you mentioned Geany, you might want to look at the Scintilla docs. (Geany is built upon Scintilla).
You might find this post interesting.
Also, be sure to get familiar with the venerable lex and yacc.
Not sure what .odt has to do with any of this.
I could see some sort of BNF being able to describe (almost) any syntax: Just run the text and the BNF through a parser, and apply a color scheme to the terminals. You could even get a bit more fancy, since you'd have the syntax tree.
In reality, I think most syntax files take an easier approach, such as regular expressions. This would put then somewhere above regular expressions but not really quite context-free in terms of power.
As for file formats, if you re-use something that exists, then you can just loot and pillage (subject to license agreements) their syntax file data.
What type of Python objects should I use to parse files with a specific syntax? Also what sort of loop should be followed to make it through the file. Should one pass be sufficient? Two, three?
It depends on the grammar. You can use pyparsing instead of implementing your own parser. It is very easy to use.
You should offer more information about your aims ...
What kind of file
What structure? Tab separated? XML - like?
What kind of encoding?
Whats the target structure?
Do you need to reparse the file in a regular time period (like an interpreter)?
how complex the syntax is? are you inventing a new one or not?
for a complex language, consider bison bindings like lex + pybison.
if you can decide what syntax to use, try YAML.
It does not depend on your programming language (python) if your parser will have one, two, three or n passes. It depends on the grammar of the syntax you are trying to parse.
If the syntax is complex enough I would recommend LEX/YACC combo as Francis said.