Lexical Analysis of Python Programming Language

Lexical Analysis of Python Programming Language - python

Does anyone know where a FLEX or LEX specification file for Python exists? For example, this is a lex specification for the ANSI C programming language: http://www.quut.com/c/ANSI-C-grammar-l-1998.html
FYI, I am trying to write code highlighting into a Cocoa application. Regex won't do it because I also want grammar parsing to fold code and recognize blocks.

Lex is typically just used for tokenizing, not full parsing. Projects that use flex/lex for tokenizing typically use yacc/bison for the actual parsing.
You may want to take a look at ANTLR, a more "modern" alternative to lexx & yacc.
The ANTLR Project has a Github repo containing many ANTLR 4 grammars including at least one for Python 3.

grammar.txt is the official, complete Python grammar -- not directly lex compatible, but you should be able to massage it into a suitable form.

Have you considered using one of the existing code highlighters, like Pygments?

Related

Is there a reliable python library for taking a BibTex entry and outputting it into specific formats?

I'm developing using Python and Django for a website. I want to take a BibTex entry and output it in a view in 3 different formats, MLA, APA, and Chicago. Is there a library out there that already does this or am I going to have to manually do the string formatting?

There are the following projects:
BibtexParser
Pybtex
Pybliographer
BabyBib
If you need complex parsing and output, Pybtex is recommended. Example:
>>> from pybtex.database.input import bibtex
>>> parser = bibtex.Parser()
>>> bib_data = parser.parse_file('examples/foo.bib')
>>> bib_data.entries.keys()
[u'ruckenstein-diffusion', u'viktorov-metodoj', u'test-inbook', u'test-booklet']
>>> print bib_data.entries['ruckenstein-diffusion'].fields['title']
Predicting the Diffusion Coefficient in Supercritical Fluids
Good luck.

Having tried them, all of these projects are bad, for various reasons: terrible APIs, bad documentation, and a failure to parse valid BibTeX files. The implementation you want doesn't show up in most Google searches, from my own searching: it's biblib. This text from the README should sell it:
There are a lot of BibTeX parsers out there. Most of them are complete nonsense based on some imaginary grammar made up by the module's author that is almost, but not quite, entirely unlike BibTeX's actual grammar. BibTeX has a grammar. It's even pretty simple, though it's probably not what you think it is. The hardest part of BibTeX's grammar is that it's only written down in one place: the BibTeX source code.

The accepted answer of using pybtex is fraught with danger as Pybtex does not preserve the bibtex format of even simple bibtex files. (https://bitbucket.org/pybtex-devs/pybtex/issues/130/need-to-specially-represent-bibtex-markup)
Pybtex is therefore losing bibtex information when reading and re-writing a simple .bib file without making any changes. Users should be very careful following the recommendations to use pybtex.
I will try biblib as well and report back but the accepted answer should be edited to not recommend pybtex.
Edit:
I was able to import the data using Bibtex Parser, without any loss of data. However, I had to compile from https://github.com/sciunto-org/python-bibtexparser as the version installed via pip was bugged at the time. Users should verify that pip is getting the latest version.
As for exporting, once the data has been imported via BibTex Parser, it's in a dictionary, and can be exported as the user desires. BibTex Parser does not have built in functions for exporting in common formats. As I did not need this functionality, I didn't specifically test it. However, once imported into a dictionary, the string output can be converted to any citation format rather easily.
Here, pybtex and a custom style file can help. I used the style file provided by the journal and compiled in LaTeX instead, but PyBtex has python style files (but also allows ingesting .sty files). So I would recommend taking the Bibtex Parser input and transferring it to PyBtex (or similar) for outputting in a certain style.

The closest thing I know of is the pybtex package

Can a ANTLR grammar file be modified to be used by PLY?

I want to make a python program that uses PLY to parse Javascript files, I din't found any sources of parsers that implement the ECMAScript, Javascript rules that use PLY.
The only thing I found was some ANTLR grammar files to parse javascript and ecmascript:
http://www.antlr.org/grammar/1153976512034/ecmascriptA3.g
http://www.antlr.org/grammar/1206736738015/JavaScript.g
Can ANTLR grammar files be adapted to be used as PLY rules, if yes, how can be done in a semi-automatic way, do I need to parse the grammar files? Is there another workaround this (i.e. than using ANTLR grammar files)?

Can ANTLR grammar files be adapted to be used as PLY rules, [...] ?
No, they can't. PLY generates LALR parsers while ANTLR generates LL ones. Their input grammars are too different for a trivial (or automated) conversion.

Lout preprocessor writtten in python used during the production of the book entitled <C++ GUI Programming with Qt>

The authors (Jasmin Blanchette & Mark Summerfield) of C++ GUI Programming with Qt has disclosed production details at the end of the book.
Quote:
The authors wrote the text using NEdit and Vim. They typeset and
indexed the text themselves, marking it up with a modified Lout syntax
that they converted to pure Lout using a custom preprocessor written
in Python.
References:
Lout official Website
Wikipedia article on Lout
My question:
Can somebody point to me where I can find details on such grammar derived from Lout along with its accompanying tool written in Python (a preprocessor)?
Edit:
Using any substitute of Lout is not an option.

Can somebody point to me where I can find details on such grammar derived from Lout?
You would be better off looking at a more established typesetting grammar like LaTex, unless you're looking for an already written Lout pre-processor.
...along with its accompanying tool written in Python (a preprocessor)?
If I understand correctly, Jasmin Blanchette & Mark Summerfield developed their own typesetting grammar, which they converted to Lout. Not knowing for sure what they did, I'm assuming it was mostly symbol substitution. To take an example from LaTex, converting \circle to \bigcirc.
After looking at Lout, I could see where it would be relatively easy to write an HTML to Lout converter.

I found txt2tags in Google code.
It targets Lout.
It's written in Python.

Python HTML parsing

I am currently trying to make a program that given a word will look up its definition and return it. Although I have gotten this to work, I had to resort to using RegEx to search for the text between the tags where the definitions are stored. What is a more efficient way to do this using python 3.x?

lxml works for Python 3. It has an ElementTree compatible API, but is using c libraries behind the scenes, so it's fast, and it supports Xpaths, which is a nice way of parsing (sometimes).

Try BeautifulSoup a good HTML parser for Python. (works with Python 3.x too, although unless you are deep into a Python 3.0 project, consider using 2.7)

Your's a pretty simple requirement when it comes to HTML parsing. Python standard library includes ElementTree module which should be helpful to do the task which you are planning to undertake. Look for the example snippet which is given in that page.
Also, never make the mistake of parsing HTML/XML using regex. You may not know when it will get insanely complicated and it is a bad idea under any situation too.

How to search code snippets

For example I want to know how to use Python pickle serialization & deserialization. Since I've never use it, reading Python official doc would be a great reference, but I prefer some snippets/example codes either has description or not. Like sites for python beginners, someone's blog, or from google codes.
How would you search? Like go to specific sites, or use what keyword. Actually this is a general question not only for Python, but for learning all languages. Thanks.

Google Code Search.
From the FAQ:
We're crawling as much publicly
accessible source code as we can find,
including archives (.tar.gz, .tar.bz2,
.tar, and .zip), CVS repositories and
Subversion repositories.
Sample search: http://www.google.com/codesearch?q=lang%3Apython+%22cpickle%22
The operators are handy:
The lang: operator, which restricts by programming language (e.g., lang:"c++", -lang:java, or lang:^(c|c#|c++)$)
The license: operator, which restricts by software license (e.g., license:apache, -license:gpl, or license:bsd|mit)
The package: operator, which restricts by package URL (e.g., package:"www.kernel.org" or package:.tgz$)
The file: operator, which restricts by filename (e.g., file:include/linux/$ or -file:.cc$)

You can also look at Activestate Python Recipe's:
http://code.activestate.com/recipes/langs/python/
Here's their recipes for Python Pickling:
http://code.activestate.com/search/#q=pickle python
O'Reilly's Python Cookbook is also good. You can read it online with a Safari membership.

There is also Nullege. A search engine especially for Python code.

The Github search is pretty good. It's usually used to search for repository but its search code works well:

In general, Google Code Search is a pretty good place to look for code snippets. To look for Python pickle examples, I'd do a search like
lang:python pickle examples

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.