What parser generator does CPython use? - python

I was reading this page in the documentation, and noticed that it says
This is the full Python grammar, as it is read by the parser generator
and used to parse Python source files
However, I'm having difficulty finding out what parser generator CPython uses. So what parser generator does CPython use? Are there other parser generators that would take the grammar on that page without any modifications?

Python is open-source, so you can inspect the source code...
In the Python source directory is a "Parser" directory containing "Python.asdl" with the note
-- ASDL's four builtin types are identifier, int, string, object
There's also an "asdl.py" file in the same directory...
"""An implementation of the Zephyr Abstract Syntax Definition Language.
See http://asdl.sourceforge.net/ and
http://www.cs.princeton.edu/research/techreps/TR-554-97
Only supports top level module decl, not view. I'm guessing that view
is intended to support the browser and I'm not interested in the
browser.
Changes for Python: Add support for module versions
"""
So it appears that it is a custom parser generator. LALR(1) parser generators are not so hard to write.

Related

Add own realtime custom parser to Python to generate and compile AST

My task is to add switch statement and remove mandatory colons from functions, classes, loops in Python.
Maybe to add some other nice features from Coffeescript.
The .py files with custom syntax must be imported with python interpreter, than parsed with a custom parser (just like Coffeescript compiler does).
(I already had a little experience in writing Python-like "for" syntax to already created custom parser, corrected several bugs. But it takes a long time to read all code and get it. So I decided to ask advice first.)
I searched a long time through internet, found several helpful answers, but still don't know how to implement it better.
Some from what I found:
Parse a .py file, read the AST, modify it, then write back the modified source code
Python's tokenize module
Python's ast module
Python's c-like preprocessor with import hook
What I think to do:
Rewrite Coffeescript parser or Python parser into pure Python
Make import hook to parse files to AST by my own parser.
Continue import (compile AST and import it to module)
(like Coffeescript does it)
So I have such questions:
- Is there a Python parser written in Python (not to rewrite all Coffeescript parser) ?
- Maybe is there any way to make ast.AST class frow own parser not rewriting ast library from C into Python ?
- How can I do it better and easier ? (except modifying Python's sources, all must be done in runtime and be totally compatible with all other Python interpreters)
- Maybe there are already some libraries, that help modifying Python's syntax ?
Thank you very much.
Best regards, Serj.

Parsing a .proto file without creating the descriptor

I understand that the normal way of using protobuf is to create the .proto and then compile it into the relevant class - Java, Python, etc. I have a requirement which might need to parse the .proto file in Python code. Has anyone tried creating own parser for the .proto file? Will it be recommended to always compile the class instead of directly parsing the .proto?
It probably won't help you directly, but yes, I've written my own parser (live demo, parser source). This code is C# hence why it probably won't help, but it clearly is possible. I started that branch 9 days ago, and now it is basically feature-complete including parser, generator, and an interactive web-site with syntax-error highlighting - so it isn't necessarily a huge amount of work.
However! You may find it easier just to shell execute "protoc" (available on maven). If you use the -oFILE / --descriptor_set_out=FILE switch (same thing, alternative syntax), then it parses the input .proto file and writes a file that is a serialized FileDescriptorSet from descriptor.proto. This means you can use your regular tools to generate code in your chosen language for descriptor.proto, then deserialize the file as a FileDescriptorSet instance. Once you've done that: you can just walk the object model to see the files, messages, enums, fields, etc. IIRC some protobuf implementations support working entirely from a descriptor (which is what protoc emits), without the codegen step.

Is is safe to parse the Abstract Syntax Trees of untrusted code?

Is it ok to use the ast module to parse and modify untrusted external Python code programatically?
I will just parse the source code, get some info from the source code (docstrings, function definitions, maybe, I don't know) and leave it there, not compile it or run it.
If you're using the ast.parse function then it should be safe. As the documentation says, this function will
Parse the source into an AST node. Equivalent to compile(source, filename, mode, ast.PyCF_ONLY_AST)
which simply parses the file even if it contains invalid Python code. It doesn't do any sort of evaluation.
If your aim is to evaluate expressions, then you can use ast.literal_eval, which is safer than the built-in eval statement
"Unsafe" implies something bad could happen controlled by the artifact you are engaging. Since parsing only builds ASTs, and (assuming there isn't something malicious in the parsing and AST building code), then parsing an arbitrary bit of text can't hurt you.
Typically to get malicious behaviour from the outside, something (controlled by you) must essentially execute some supplied code. Clearly building a parse tree doesn't execute the outside program. However, if you built an interpreter that interpreted the parse tree and ran it, you might have a problem.
I believe so. No code is executed. In fact, parsing the ast is exactly what ast.literal_eval does, and that's deemed safe.

ISO human-readable parser for Python in Python

I'm looking for a parser for Python (preferably v. 2.7) written in human-readable Python. Performance or flexibility are not important. The accuracy/correctness of the parsing and the clarity of the parser's code are far more important considerations here.
Searching online I've found a few parser generators that generate human-readable Python code, but I have not found the corresponding Python grammar to go with any of them (from what I could see, they all follow different grammar specification conventions). At any rate, even if I could find a suitable parser-generator/Python grammar combo, a readily available Python parser that fits my requirements (human-readable Python code) is naturally far more preferable.
Any suggestions?
Thanks!
PyPy is a Python implementation written entirely in Python. I am not an expert, but here's the link to their parser which - obviously - has been written in Python itself:
https://bitbucket.org/pypy/pypy/src/819faa2129a8/pypy/interpreter/pyparser
I think you should invest your effort in ast. An excerpt from the python docs.
The ast module helps Python applications to process trees of the
Python abstract syntax grammar. The abstract syntax itself might
change with each Python release; this module helps to find out
programmatically what the current grammar looks like.

Library for programming Abstract Syntax Trees in Python

I'm creating a tree to represent a simple language. I'm very familiar with Abstract Syntax Trees, and have worked on frameworks for building and using them in C++. Is there a standard python library for specifying or manipulating arbitrary ASTs? Failing that, is there a tree library which is useful for the same purpose?
Note, I am not manipulating Python ASTs, so I think the AST module isn't suitable.
ASTs are very simple to implement in Python. For example, for my pycparser project (a complete C parser in Python) I've implemented ASTs based on ideas borrowed from Python's modules. The various AST nodes are specified in a YAML configuration file, and I generate Python code for these nodes in Python itself.
pyast is a package for building declarative abstract syntax trees.
If you represent your grammar elements as expressions in pyparsing, you can attach a parse action to each expression which returns a class instance containing the parsed tokens in a parser-specific type. There are a couple of examples on the pyparsing wiki that illustrate this technique (invRegex.py, simpleBool.py and evalArith.py). (These grammars all use the operatorPrecedence built-in, which can obscure some of the grammar structure, but
This blog entry, though short on implementation detail, describes a nice interface that Python ASTs could implement.
http://chris-lamb.co.uk/2006/12/08/visitor-pattern-in-python/

Categories

Resources