What type of Python objects should I use to parse files with a specific syntax? Also what sort of loop should be followed to make it through the file. Should one pass be sufficient? Two, three?
It depends on the grammar. You can use pyparsing instead of implementing your own parser. It is very easy to use.
You should offer more information about your aims ...
What kind of file
What structure? Tab separated? XML - like?
What kind of encoding?
Whats the target structure?
Do you need to reparse the file in a regular time period (like an interpreter)?
how complex the syntax is? are you inventing a new one or not?
for a complex language, consider bison bindings like lex + pybison.
if you can decide what syntax to use, try YAML.
It does not depend on your programming language (python) if your parser will have one, two, three or n passes. It depends on the grammar of the syntax you are trying to parse.
If the syntax is complex enough I would recommend LEX/YACC combo as Francis said.
Related
Regular expressions are highly unreadable and difficult to debug. Does there exist any replacement for text processing which could be handled by mere mortals?
Criteria include
It's a library or a tool (please point the answer to the library itself)
Human readable syntax (no cheatsheets needed)
Documentation with examples
Able to debug expressions
If possible can you mention language specific and language independent solutions. I am mainly developing on Python, but I'd hope to see a library which could be ported to other languages/platforms.
I once read that Haskell would have nice text processing capabilities, but again, this is a built-in language solution, not a generic solution.
Edit: Please do not give answers "regular expressions are not bad, do like this!" Stackoverflow.com is not a place for subjective opinions, but I think a regular expressions are bad and I want to see my alternative options for using them.
I know this post was old, but people might be benefit from this question/answers. VerbalExpressions is still using regex behind the scene, but in a friendly way.
Intro: http://thechangelog.com/stop-writing-regular-expressions-express-them-with-verbal-expressions/
Python fork: https://github.com/VerbalExpressions
you could use the re.VERBOSE flag:
charref = re.compile(r"""
&[#] # Start of a numeric entity reference
(
0[0-7]+ # Octal form
| [0-9]+ # Decimal form
| x[0-9a-fA-F]+ # Hexadecimal form
)
; # Trailing semicolon
""", re.VERBOSE)
pyparsing offers another method to create and execute (simple) grammars. I've been using it in a project for parsing different kind of log files and the use was rather simple and somewhat more intuitive than with regexps.
Take a look at Ned Batchelder's list of python parsing tools
LPeg is a Lua library and not a Python one I am afraid, but it might have been ported by someone. Either way, it is open-source so you could port it if you wanted to yourself. It has a somewhat different approach to text-matching than regular expressions do, and as such I find it has a considerable learning curve. However, where efficiency is concerned it has the potential to out-perform regular expressions - but obviously, such a statement depends strongly on the testcase and ones ability in both languages.
If you're concerned about understanding and debugging others' regex, there are translational tools that make them more easily understandable. My favorite is RegExBuddy on Windows. On Mac, RegExRx in the AppStore is helpful.
I am writing an application which reads an input file that currently has its own grammar, which is processed by lex/yacc.
I'm looking to modify this so as to make this input file a Python script instead, and was wondering if someone can point me to a beginner's guide to using the parser module in Python. I'm fairly new to Python itself, but have worked through a fair chunk of the online tutorial.
From what I have researched, I know there are options (such as pyparsing) which can allow me to keep the existing grammar and use Pyparsing as a replacement for lex/yacc. However, I am curious to learn the Python parser module in more detail and explore its feasibility.
Thanks.
You mean the parser module? It's a parser for Python source code only, not a general purpose parser. You can't use it to parse anything else.
As Jochen said, the parser module is for parsing Python code. I think you're best off checking out Ned Batchelder's list of parsers. PyParsing does things pretty differently from Lex and Yacc, so I'm not sure why you think you could keep your existing grammar and lexer. A better bet might be David Beazley's PLY toolkit. It's solid and has excellent documentation.
I recommend that you check out https://github.com/erezsh/lark
It's great for newcomers to parsing: It can parse ALL context-free grammars, it automatically builds an AST (with line & column numbers), and it accepts the grammar in EBNF format, which is considered the standard and is very easy to write.
Hey as a project to improve my programing skills I've begun programing a nice code editor in python to teach myself project management, version control, and gui programming. I was wanting to utilize syntax files made for other programs so I could have a large collection already. I was wondering if there was any kind of universal syntax file format much in the same sense as .odt files. I heard of one once in a forum, it had a website, but I can't remember it now. If not I may just try to use gedit syntax files or geany.
thanks
If you're planning to do syntax highlighting, check out Pygments, especially the bit about lexers.
Since you mentioned Geany, you might want to look at the Scintilla docs. (Geany is built upon Scintilla).
You might find this post interesting.
Also, be sure to get familiar with the venerable lex and yacc.
Not sure what .odt has to do with any of this.
I could see some sort of BNF being able to describe (almost) any syntax: Just run the text and the BNF through a parser, and apply a color scheme to the terminals. You could even get a bit more fancy, since you'd have the syntax tree.
In reality, I think most syntax files take an easier approach, such as regular expressions. This would put then somewhere above regular expressions but not really quite context-free in terms of power.
As for file formats, if you re-use something that exists, then you can just loot and pillage (subject to license agreements) their syntax file data.
i am creating ( researching possibility of ) a highly customizable python client and would like to allow users to actually edit the code in another language to customize the running of program. ( analogous to browser which itself coded in c/c++ and run another language html/js ). so my question is , is there any programming language implemented in pure python which i can see as a reference ( or use directly ? ) -- i need simple language ( simple statements and ifs can do )
edit: sorry if i did not make myself clear but what i want is "a language to customize the running of program" , even though pypi seems a great option, what i am looking for is more simple which i can study and extend myself if need arise. my google searches pointing towards xml based langagues. ( BMEL , XForms etc ).
The question isn't completely clear on scope, but I have a hunch that PyPy, embedding other full languages, and similar solutions might be overkill. It sounds like iamgopal may really be interested in something more like Interpreter Pattern or Little Language.
If the language you want to support is really small (see the Interpreter Pattern link), then hand-coding this yourself in Python won't be too hard. You can write a simple parser (Google around; here's one example), then walk the AST and evaluate user expressions.
However, if you expect this to be used for a long time or by many people, it may be worth throwing a real language at the problem. (I'd recommend Python itself if your users are already familiar with basic Python syntax).
Ren'Py is a modification to Python syntax built on top of Python itself, using the language tools in the stdlib.
For your user's sake, don't use an XML based language - XML is an awful basis for a programming language and your users will hate you for it.
Here is a suggestion. Use a strict subset of Python for your language. Use the compiler module to convert their code into an abstract syntax tree and walk the tree to to validate that the code conforms to your subset before converting the AST into python bytecode.
N.B. I just checked the docs and see that the compiler package is deprecated in 2.6 and removed in Python 3.x. Does anyone know why that is?
Numerous template languages such as Cheetah, Django templates, Genshi, Mako, Mighty might serve as an example.
Why not Python itself? With some care you can use eval to run user code.
One of the good thing about interpreted scripting languages is that you don't need another extra scripting language!
PLY (Python Lex-Yacc)
is something of your interest.
Possibly Common Lisp (or any other Lisp) will be the best choice for that task. Because Lisp make it possible to easily extend host language with powerful macroses and construct DSL (domain specific language).
If all you need is simple if statements and expressions, I'm sure it wouldn't be an awful task to parse each line. Something like
if some flag
activate some feature
deactivate some feature
elif some other flag
activate some feature
activate some feature
else
logout
Just write a class which, while parsing takes the first word, checks if it's "if, elif, else," etc, and if so, check a flag and set a flag saying you either are or are not executing until the next conditional. If it's not a conditional, call a function based on the first keyword that would modify the program state in some way.
The class could store some local execution state (are we in an if statement? If so are we executing this branch?) and have another class containing some global application state (flags that are checkable by if statements, etc).
This is probably the wrong thing to do in your situation (it's very prone to bugs, it's dangerous if you don't treat the data in the scripts correctly), but it's at least a start if you do decide to interpret your own mini-language.
Seriously though, if you try this, be very, very, srs careful. Don't give the scripts any functionality that they don't definitely need, because you are almost certainly opening security holes by doing something like this.
Don't say I didn't warn you.
I'd like to parse Python source in order to try making a basic source code converter from Python to Go.
What module should I use?
Should I proceed or not?
If I should proceed, how?
Have a look at the language services packages, particularly the ast.
My guess is that if you don't already have a solid grasp of both parsing as well as code generation techniques, this is going to be a difficult project to undertake.
good luck!
As for the 'should I go ahead or better not' question: why do you want to do this in the first place?
If it's a purely learning exercise, then you don't don't need to ask us whether it's worthwhile. You want to learn, so go right ahead.
If it's meant to be a practical tool, then my suggestion is to not do it. An industrial-strength tool to perform such conversions might be useful but I would guess that you're not going to go that far. With that in mind it's probably more fruitful to rewrite the Python code in Go manually.
That assumes there is any real benefit to compiling to Go; current testing suggests that you get better performance and similar code structure from using Stackless Python.
The Boo Solution
Are you trying to make a python-like language, that can compiles into Go? This seems most sensible, as you will want to do Go-specific things (to take advantage of Go features).
Look at pyparsing. It includes an example of a complete python parser, but you probably don't want to do that.
You want to incrementally build your converter / translator, so you want to incrementally build the parser, otherwise you might choke on the AST. OK, you could parse everything and just ignore the stuff you don't understand, but that's not great behavior from a compiler.
You could start with parsing basic arithmetic.
The Pyrex Solution
This is similar to the Boo solution, but much harder. Get the Boo solution working first. Then learn to generate wrapper code, so your Go and python parts can work together.
The PyPy Solution
A complete Python-Go compiler? Good luck. You'll need it.
There's a good list of parsers rounded-up by Ned Batchelder which might help.