Parsing blocks as Python - python

I am writing a lexer + parser in JFlex + CUP, and I wanted to have Python-like syntax regarding blocks; that is, indentation marks the block level.
I am unsure of how to tackle this, and whether it should be done at the lexical or sintax level.
My current approach is to solve the issue at the lexical level - newlines are parsed as instruction separators, and when one is processed I move the lexer to a special state which checks how many characters are in front of the new line and remembers in which column the last line started, and accordingly introduces and open block or close block character.
However, I am running into all sort of trouble. For example:
JFlex cannot match empty strings, so my instructions need to have at least one blanck after every newline.
I cannot close two blocks at the same time with this approach.
Is my approach correct? Should I be doing things different?

Your approach of handling indents in the lexer rather than the parser is correct. Well, it’s doable either way, but this is usually the easier way, and it’s the way Python itself (or at least CPython and PyPy) does it.
I don’t know much about JFlex, and you haven’t given us any code to work with, but I can explain in general terms.
For your first problem, you're already putting the lexer into a special state after the newline, so that "grab 0 or more spaces" should be doable by escaping from the normal flow of things and just running a regex against the line.
For your second problem, the simplest solution (and the one Python uses) is to keep a stack of indents. I'll demonstrate something a bit simpler than what Python does.
First:
indents = [0]
After each newline, grab a run of 0 or more spaces as spaces. Then:
if len(spaces) == indents[-1]:
pass
elif len(spaces) > indents[-1]:
indents.append(len(spaces))
emit(INDENT_TOKEN)
else:
while len(spaces) != indents[-1]:
indents.pop()
emit(DEDENT_TOKEN)
Now your parser just sees INDENT_TOKEN and DEDENT_TOKEN, which are no different from, say, OPEN_BRACE_TOKEN and CLOSE_BRACE_TOKEN in a C-like language.
Of you’d want better error handling—raise some kind of tokenizer error rather than an implicit IndexError, maybe use < instead of != so you can detect that you’ve gone too far instead of exhausting the stack (for better error recovery if you want to continue to emit further errors instead of bailing at the first one), etc.
For real-life example code (with error handling, and tabs as well as spaces, and backslash newline escaping, and handling non-syntactic indentation inside of parenthesized expressions, etc.), see the tokenize docs and source in the stdlib.

Related

why does python function def, class def, if else definition require a colon ":" [duplicate]

What is the purpose of the colon before a block in Python?
Example:
if n == 0:
print "The end"
The colon is there to declare the start of an indented block.
Technically, it's not necessary; you could just indent and de-indent when the block is done. However, based on the Python koan “explicit is better than implicit” (EIBTI), I believe that Guido deliberately made the colon obligatory, so any statement that should be followed by indented code ends in a colon. (It also allows one-liners if you continue after the colon, but this style is not in wide use.)
It also makes the work of syntax-aware auto-indenting editors easier, which also counted in the decision.
This question turns out to be a Python FAQ, and I found one of its answers by Guido here:
Why are colons required for the if/while/def/class statements?
The colon is required primarily to enhance readability (one of the results of the experimental ABC language). Consider this:
if a == b
print a
versus
if a == b:
print a
Notice how the second one is slightly easier to read. Notice further how a colon sets off the example in this FAQ answer; it’s a standard usage in English.
Another minor reason is that the colon makes it easier for editors with syntax highlighting; they can look for colons to decide when indentation needs to be increased instead of having to do a more elaborate parsing of the program text.
Consider the following list of things to buy from the grocery store, written in Pewprikanese.
pewkah
lalala
chunkykachoo
pewpewpew
skunkybacon
When I read that, I'm confused, Are chunkykachoo and pewpewpew a kind of lalala? Or what if chunkykachoo and pewpewpew are indented just because they are special items?
Now see what happens when my Pewprikanese friend add a colon to help me parse the list better: (<-- like this)
pewkah
lalala: (<-- see this colon)
chunkykachoo
pewpewpew
skunkybacon
Now it's clear that chunkykachoo and pewpewpew are a kind of lalala.
Let's say there is a person who's starting to learn Python, which happens to be her first programming language to learn. Without colons, there's a considerable probability that she's going to keep thinking "this lines are indented because this lines are like special items.", and it could take a while to realize that that's not the best way to think about indentation.
Three reasons:
To increase readability. The colon helps the code flow into the following indented block.
To help text editors/IDEs, they can automatically indent the next line if the previous line ended with a colon.
To make parsing by python slightly easier.
As far as I know, it's an intentional design to make it more obvious, that the reader should expect an indentation after the colon.
It also makes constructs like this possible:
if expression: action()
code_continues()
since having the code for the if immediately following the colon makes it possible for the compiler to understand that the next line should not be indented.
According to Guido Van Rossum, the Python inventor, the idea of using a colon to make the structure more apparent is inspired by earlier experiments with a Python predecessor, ABC language, which also targeted the beginners. Apparently, on their early tests, beginner learners progressed faster with colon than without it. Read the whole story at Guido's post python history blog.
http://python-history.blogspot.com/2009/02/early-language-design-and-development.html
And yes, the colon is useful in one-liners and is less annoying than the semicolon. Also style guide for long time recommended break on several lines only when it ends with a binary operator
x = (23 +
24 +
33)
Addition of colon made compound statement look the same way for greater style uniformity.
There is a 'colonless' encoding for CPython as well as colon-less dialect, called cobra. Those did not pick up.

Python - How to style this line of code?

First of, I'm not sure if SO is the right place for this question, so feel free to move it somewhere more appropriate if necessary.
cmd_folder = os.path.realpath(os.path.abspath(os.path.split(inspect.getfile( inspect.currentframe() ))[0]))
I have this line of code. The Python PEP 8 document recommends limiting lines to 79 characters to preserve readability on smaller screens.
What is the most elegant way to style this line of code to fit the PEP recommendations?
cmd_folder = os.path.realpath(os.path.abspath(os.path.split
(inspect.getfile
( inspect.currentframe() ))[0]))
Is this the most appropriate way for is there a better one that I have not thought of?
I would say in the case of your example code, it would be more appropriate to split them up into individual operation, as opposed to making the code harder to read.
aFile = inspect.getfile(inspect.currentframe())
cmd_folder = os.path.realpath(os.path.abspath(os.path.split(aFile)[0]))
It should not be such a chore to trace the start and close of all those parenthesis to figure out what is happening with each temp variable. Variable names can help with clarity, by naming the intention/type of the results.
If it is two, maybe 3, nested calls I might do the act of newlines in one call, but definitely not when you have a ton of parenthesis, and list indexing squished in between. But normally I am only more inclined to do that with chained calls like foo.bar().biz().baz() since it flows left to right.
Always assume some poor random developer will have to read your code tomorrow.
I am in general with the answer provided by jdi (split into several expressions). But I would like to show another aspect of this issue.
In general, if just trying to properly break and indent this line, you should also follow PEP8 rules on indentation:
Use 4 spaces per indentation level.
For really old code that you don't want to mess up, you can continue to use 8-space tabs.
Continuation lines should align wrapped elements either vertically using Python's implicit line joining inside parentheses, brackets and braces, or using a hanging indent. When using a hanging indent the following considerations should be applied; there should be no arguments on the first line and further indentation should be used to clearly distinguish itself as a continuation line.
[several examples follow]
So in your case it could look like this:
#---------------------------------------------------------79th-column-mark--->|
cmd_folder = os.path.realpath(
os.path.abspath(os.path.split(inspect.getfile(inspect.currentframe()))[0]))
But as I mentioned at the beginning, and as PEP20 (The Zen of Python) mentions:
Flat is better than nested.
Sparse is better than dense.
Readability counts.
thus you should definitely split your code into few expressions, as jdi notes.
I tend to do it like this:
cmd_folder = os.path.realpath(os.path.abspath(os.path.split(
inspect.getfile(inspect.currentframe()))[0]))
Open-parens at the end of the first line, and indention on continuation lines. Ultimately it boils down to what you think is more aesthetic.
There's also no shame in doing part of the computation and saving the result in a variable, like this:
f = inspect.getfile(inspect.currentframe())
cmd_folder = os.path.realpath(os.path.abspath(os.path.split(f)[0]))
If you really insist on keeping it one expression, I'd go with an "extreme" simply because it actually looks nice:
cmd_folder = os.path.realpath(
os.path.abspath(
os.path.split(
inspect.getfile(
inspect.currentframe()
))[0]
))

Reading a line from standard input in Python

What (if any) are the differences between the following two methods of reading a line from standard input: raw_input() and sys.stdin.readline() ? And in which cases one of these methods is preferable over the other ?
raw_input() takes an optional prompt argument. It also strips the trailing newline character from the string it returns, and supports history features if the readline module is loaded.
readline() takes an optional size argument, does not strip the trailing newline character and does not support history whatsoever.
Since they don't do the same thing, they're not really interchangeable. I personally prefer using raw_input() to fetch user input, and readline() to read lines out of a file.
"However, from the point of view of many Python beginners and educators, the use of sys.stdin.readline() presents the following problems:
Compared to the name "raw_input", the name "sys.stdin.readline()" is clunky and inelegant.
The names "sys" and "stdin" have no meaning for most beginners, who are mainly interested in what the function does, and not where in the package structure it is located. The lack of meaning also makes it difficult to remember: is it "sys.stdin.readline()", or " stdin.sys.readline()"? To a programming novice, there is not any obvious reason to prefer one over the other. In contrast, functions simple and direct names like print, input, and raw_input, and open are easier to remember." from here: http://www.python.org/dev/peps/pep-3111/

Why do python statements end with : (colon)? [duplicate]

What is the purpose of the colon before a block in Python?
Example:
if n == 0:
print "The end"
The colon is there to declare the start of an indented block.
Technically, it's not necessary; you could just indent and de-indent when the block is done. However, based on the Python koan “explicit is better than implicit” (EIBTI), I believe that Guido deliberately made the colon obligatory, so any statement that should be followed by indented code ends in a colon. (It also allows one-liners if you continue after the colon, but this style is not in wide use.)
It also makes the work of syntax-aware auto-indenting editors easier, which also counted in the decision.
This question turns out to be a Python FAQ, and I found one of its answers by Guido here:
Why are colons required for the if/while/def/class statements?
The colon is required primarily to enhance readability (one of the results of the experimental ABC language). Consider this:
if a == b
print a
versus
if a == b:
print a
Notice how the second one is slightly easier to read. Notice further how a colon sets off the example in this FAQ answer; it’s a standard usage in English.
Another minor reason is that the colon makes it easier for editors with syntax highlighting; they can look for colons to decide when indentation needs to be increased instead of having to do a more elaborate parsing of the program text.
Consider the following list of things to buy from the grocery store, written in Pewprikanese.
pewkah
lalala
chunkykachoo
pewpewpew
skunkybacon
When I read that, I'm confused, Are chunkykachoo and pewpewpew a kind of lalala? Or what if chunkykachoo and pewpewpew are indented just because they are special items?
Now see what happens when my Pewprikanese friend add a colon to help me parse the list better: (<-- like this)
pewkah
lalala: (<-- see this colon)
chunkykachoo
pewpewpew
skunkybacon
Now it's clear that chunkykachoo and pewpewpew are a kind of lalala.
Let's say there is a person who's starting to learn Python, which happens to be her first programming language to learn. Without colons, there's a considerable probability that she's going to keep thinking "this lines are indented because this lines are like special items.", and it could take a while to realize that that's not the best way to think about indentation.
Three reasons:
To increase readability. The colon helps the code flow into the following indented block.
To help text editors/IDEs, they can automatically indent the next line if the previous line ended with a colon.
To make parsing by python slightly easier.
As far as I know, it's an intentional design to make it more obvious, that the reader should expect an indentation after the colon.
It also makes constructs like this possible:
if expression: action()
code_continues()
since having the code for the if immediately following the colon makes it possible for the compiler to understand that the next line should not be indented.
According to Guido Van Rossum, the Python inventor, the idea of using a colon to make the structure more apparent is inspired by earlier experiments with a Python predecessor, ABC language, which also targeted the beginners. Apparently, on their early tests, beginner learners progressed faster with colon than without it. Read the whole story at Guido's post python history blog.
http://python-history.blogspot.com/2009/02/early-language-design-and-development.html
And yes, the colon is useful in one-liners and is less annoying than the semicolon. Also style guide for long time recommended break on several lines only when it ends with a binary operator
x = (23 +
24 +
33)
Addition of colon made compound statement look the same way for greater style uniformity.
There is a 'colonless' encoding for CPython as well as colon-less dialect, called cobra. Those did not pick up.

using an alternative string quotation syntax in python

Just wondering...
I find using escape characters too distracting. I'd rather do something like this (console code):
>>> print ^'Let's begin and end with sets of unlikely 2 chars and bingo!'^
Let's begin and end with sets of unlikely 2 chars and bingo!
Note the ' inside the string, and how this syntax would have no issue with it, or whatever else inside for basically all cases. Too bad markdown can't properly colorize it (yet), so I decided to <pre> it.
Sure, the ^ could be any other char, I'm not sure what would look/work better. That sounds good enough to me, tho.
Probably some other language already have a similar solution. And, just maybe, Python already have such a feature and I overlooked it. I hope this is the case.
But if it isn't, would it be too hard to, somehow, change Python's interpreter and be able to select an arbitrary (or even standardized) syntax for notating the strings?
I realize there are many ways to change statements and the whole syntax in general by using pre-compilators, but this is far more specific. And going any of those routes is what I call "too hard". I'm not really needing to do this so, again, I'm just wondering.
Python has this use """ or ''' as the delimiters
print '''Let's begin and end with sets of unlikely 2 chars and bingo'''
How often do you have both of 3' and 3" in a string

Categories

Resources