What is the purpose of the colon before a block in Python?
Example:
if n == 0:
print "The end"
The colon is there to declare the start of an indented block.
Technically, it's not necessary; you could just indent and de-indent when the block is done. However, based on the Python koan “explicit is better than implicit” (EIBTI), I believe that Guido deliberately made the colon obligatory, so any statement that should be followed by indented code ends in a colon. (It also allows one-liners if you continue after the colon, but this style is not in wide use.)
It also makes the work of syntax-aware auto-indenting editors easier, which also counted in the decision.
This question turns out to be a Python FAQ, and I found one of its answers by Guido here:
Why are colons required for the if/while/def/class statements?
The colon is required primarily to enhance readability (one of the results of the experimental ABC language). Consider this:
if a == b
print a
versus
if a == b:
print a
Notice how the second one is slightly easier to read. Notice further how a colon sets off the example in this FAQ answer; it’s a standard usage in English.
Another minor reason is that the colon makes it easier for editors with syntax highlighting; they can look for colons to decide when indentation needs to be increased instead of having to do a more elaborate parsing of the program text.
Consider the following list of things to buy from the grocery store, written in Pewprikanese.
pewkah
lalala
chunkykachoo
pewpewpew
skunkybacon
When I read that, I'm confused, Are chunkykachoo and pewpewpew a kind of lalala? Or what if chunkykachoo and pewpewpew are indented just because they are special items?
Now see what happens when my Pewprikanese friend add a colon to help me parse the list better: (<-- like this)
pewkah
lalala: (<-- see this colon)
chunkykachoo
pewpewpew
skunkybacon
Now it's clear that chunkykachoo and pewpewpew are a kind of lalala.
Let's say there is a person who's starting to learn Python, which happens to be her first programming language to learn. Without colons, there's a considerable probability that she's going to keep thinking "this lines are indented because this lines are like special items.", and it could take a while to realize that that's not the best way to think about indentation.
Three reasons:
To increase readability. The colon helps the code flow into the following indented block.
To help text editors/IDEs, they can automatically indent the next line if the previous line ended with a colon.
To make parsing by python slightly easier.
As far as I know, it's an intentional design to make it more obvious, that the reader should expect an indentation after the colon.
It also makes constructs like this possible:
if expression: action()
code_continues()
since having the code for the if immediately following the colon makes it possible for the compiler to understand that the next line should not be indented.
According to Guido Van Rossum, the Python inventor, the idea of using a colon to make the structure more apparent is inspired by earlier experiments with a Python predecessor, ABC language, which also targeted the beginners. Apparently, on their early tests, beginner learners progressed faster with colon than without it. Read the whole story at Guido's post python history blog.
http://python-history.blogspot.com/2009/02/early-language-design-and-development.html
And yes, the colon is useful in one-liners and is less annoying than the semicolon. Also style guide for long time recommended break on several lines only when it ends with a binary operator
x = (23 +
24 +
33)
Addition of colon made compound statement look the same way for greater style uniformity.
There is a 'colonless' encoding for CPython as well as colon-less dialect, called cobra. Those did not pick up.
Related
I am writing a lexer + parser in JFlex + CUP, and I wanted to have Python-like syntax regarding blocks; that is, indentation marks the block level.
I am unsure of how to tackle this, and whether it should be done at the lexical or sintax level.
My current approach is to solve the issue at the lexical level - newlines are parsed as instruction separators, and when one is processed I move the lexer to a special state which checks how many characters are in front of the new line and remembers in which column the last line started, and accordingly introduces and open block or close block character.
However, I am running into all sort of trouble. For example:
JFlex cannot match empty strings, so my instructions need to have at least one blanck after every newline.
I cannot close two blocks at the same time with this approach.
Is my approach correct? Should I be doing things different?
Your approach of handling indents in the lexer rather than the parser is correct. Well, it’s doable either way, but this is usually the easier way, and it’s the way Python itself (or at least CPython and PyPy) does it.
I don’t know much about JFlex, and you haven’t given us any code to work with, but I can explain in general terms.
For your first problem, you're already putting the lexer into a special state after the newline, so that "grab 0 or more spaces" should be doable by escaping from the normal flow of things and just running a regex against the line.
For your second problem, the simplest solution (and the one Python uses) is to keep a stack of indents. I'll demonstrate something a bit simpler than what Python does.
First:
indents = [0]
After each newline, grab a run of 0 or more spaces as spaces. Then:
if len(spaces) == indents[-1]:
pass
elif len(spaces) > indents[-1]:
indents.append(len(spaces))
emit(INDENT_TOKEN)
else:
while len(spaces) != indents[-1]:
indents.pop()
emit(DEDENT_TOKEN)
Now your parser just sees INDENT_TOKEN and DEDENT_TOKEN, which are no different from, say, OPEN_BRACE_TOKEN and CLOSE_BRACE_TOKEN in a C-like language.
Of you’d want better error handling—raise some kind of tokenizer error rather than an implicit IndexError, maybe use < instead of != so you can detect that you’ve gone too far instead of exhausting the stack (for better error recovery if you want to continue to emit further errors instead of bailing at the first one), etc.
For real-life example code (with error handling, and tabs as well as spaces, and backslash newline escaping, and handling non-syntactic indentation inside of parenthesized expressions, etc.), see the tokenize docs and source in the stdlib.
What is the purpose of the colon before a block in Python?
Example:
if n == 0:
print "The end"
The colon is there to declare the start of an indented block.
Technically, it's not necessary; you could just indent and de-indent when the block is done. However, based on the Python koan “explicit is better than implicit” (EIBTI), I believe that Guido deliberately made the colon obligatory, so any statement that should be followed by indented code ends in a colon. (It also allows one-liners if you continue after the colon, but this style is not in wide use.)
It also makes the work of syntax-aware auto-indenting editors easier, which also counted in the decision.
This question turns out to be a Python FAQ, and I found one of its answers by Guido here:
Why are colons required for the if/while/def/class statements?
The colon is required primarily to enhance readability (one of the results of the experimental ABC language). Consider this:
if a == b
print a
versus
if a == b:
print a
Notice how the second one is slightly easier to read. Notice further how a colon sets off the example in this FAQ answer; it’s a standard usage in English.
Another minor reason is that the colon makes it easier for editors with syntax highlighting; they can look for colons to decide when indentation needs to be increased instead of having to do a more elaborate parsing of the program text.
Consider the following list of things to buy from the grocery store, written in Pewprikanese.
pewkah
lalala
chunkykachoo
pewpewpew
skunkybacon
When I read that, I'm confused, Are chunkykachoo and pewpewpew a kind of lalala? Or what if chunkykachoo and pewpewpew are indented just because they are special items?
Now see what happens when my Pewprikanese friend add a colon to help me parse the list better: (<-- like this)
pewkah
lalala: (<-- see this colon)
chunkykachoo
pewpewpew
skunkybacon
Now it's clear that chunkykachoo and pewpewpew are a kind of lalala.
Let's say there is a person who's starting to learn Python, which happens to be her first programming language to learn. Without colons, there's a considerable probability that she's going to keep thinking "this lines are indented because this lines are like special items.", and it could take a while to realize that that's not the best way to think about indentation.
Three reasons:
To increase readability. The colon helps the code flow into the following indented block.
To help text editors/IDEs, they can automatically indent the next line if the previous line ended with a colon.
To make parsing by python slightly easier.
As far as I know, it's an intentional design to make it more obvious, that the reader should expect an indentation after the colon.
It also makes constructs like this possible:
if expression: action()
code_continues()
since having the code for the if immediately following the colon makes it possible for the compiler to understand that the next line should not be indented.
According to Guido Van Rossum, the Python inventor, the idea of using a colon to make the structure more apparent is inspired by earlier experiments with a Python predecessor, ABC language, which also targeted the beginners. Apparently, on their early tests, beginner learners progressed faster with colon than without it. Read the whole story at Guido's post python history blog.
http://python-history.blogspot.com/2009/02/early-language-design-and-development.html
And yes, the colon is useful in one-liners and is less annoying than the semicolon. Also style guide for long time recommended break on several lines only when it ends with a binary operator
x = (23 +
24 +
33)
Addition of colon made compound statement look the same way for greater style uniformity.
There is a 'colonless' encoding for CPython as well as colon-less dialect, called cobra. Those did not pick up.
For rapidly changing business rules, I'm storing IronPython fragments in XML files. So far this has been working out well, but I'm starting to get to the point where I need more that just one-line expressions.
The problem is that XML and significant whilespace don't play well together. Before I abandon it for another language, I would like to know if IronPython has an alternative syntax.
IronPython doesn't have an alternate syntax. It's an implementation of Python, and Python uses significant indentation (all languages use significant whitespace, not sure why we talk about whitespace when it's only indentation that's unusual in the Python case).
>>> from __future__ import braces
File "<stdin>", line 1
from __future__ import braces
^
SyntaxError: not a chance
All I want is something that will let my users write code like
Ummm... Don't do this. You don't actually want this. In the long run, this will cause endless little issues because you're trying to force too much content into an attribute.
Do this.
<Rule Name="Markup">
<Formula>(Account.PricingLevel + 1) * .05</Formula>
</Rule>
You should try not to have significant, meaningful stuff in attributes. As a general XML design policy, you should use tags and save attributes for names and ID's and the like. When you look at well-done XSD's and DTD's, you see that attributes are used minimally.
Having the body of the rule in a separate tag (not an attribute) saves much pain. And it allows a tool to provide correct CDATA sections. Use a tool like Altova's XML Spy to assure that your tags have space preserved properly.
I think you can set the xml:space="preserve" attribute or use a <![CDATA[ to avoid other issues, with for example quotes and greater equal signs.
Apart from the already mentioned CDATA sections, there's pindent.py which can, among others, fix broken indentation based on comments a la #end if - to quote the linked file:
When called as "pindent -r" it assumes its input is a Python program with block-closing comments but with its indentation messed up, and outputs a properly indented version.
...
A "block-closing comment" is a comment of the form '# end <keyword>' where is the keyword that opened the block. If the opening keyword is 'def' or 'class', the function or class name may be repeated in the block-closing comment as well. Here is an example of a program fully augmented with block-closing comments:
def foobar(a, b):
if a == b:
a = a+1
elif a < b:
b = b-1
if b > a: a = a-1
# end if
else:
print 'oops!'
# end if
# end def foobar
It's bundeled with CPython, but if IronPython doesn't have it, just grab it from the repository.
Just wondering...
I find using escape characters too distracting. I'd rather do something like this (console code):
>>> print ^'Let's begin and end with sets of unlikely 2 chars and bingo!'^
Let's begin and end with sets of unlikely 2 chars and bingo!
Note the ' inside the string, and how this syntax would have no issue with it, or whatever else inside for basically all cases. Too bad markdown can't properly colorize it (yet), so I decided to <pre> it.
Sure, the ^ could be any other char, I'm not sure what would look/work better. That sounds good enough to me, tho.
Probably some other language already have a similar solution. And, just maybe, Python already have such a feature and I overlooked it. I hope this is the case.
But if it isn't, would it be too hard to, somehow, change Python's interpreter and be able to select an arbitrary (or even standardized) syntax for notating the strings?
I realize there are many ways to change statements and the whole syntax in general by using pre-compilators, but this is far more specific. And going any of those routes is what I call "too hard". I'm not really needing to do this so, again, I'm just wondering.
Python has this use """ or ''' as the delimiters
print '''Let's begin and end with sets of unlikely 2 chars and bingo'''
How often do you have both of 3' and 3" in a string
OK, I'm aware that triple-quotes strings can serve as multiline comments. For example,
"""Hello, I am a
multiline comment"""
and
'''Hello, I am a
multiline comment'''
But technically speaking these are strings, correct?
I've googled and read the Python style guide, but I was unable to find a technical answer to why there is no formal implementation of multiline, /* */ type of comments. I have no problem using triple quotes, but I am a little curious as to what led to this design decision.
I doubt you'll get a better answer than, "Guido didn't feel the need for multi-line comments".
Guido has tweeted about this:
Python tip: You can use multi-line strings as multi-line comments. Unless used as docstrings, they generate no code! :-)
Multi-line comments are easily breakable. What if you have the following in a simple calculator program?
operation = ''
print("Pick an operation: +-*/")
# Get user input here
Try to comment that with a multi-line comment:
/*
operation = ''
print("Pick an operation: +-*/")
# Get user input here
*/
Oops, your string contains the end comment delimiter.
Triple-quoted text should NOT be considered multi-line comments; by convention, they are docstrings. They should describe what your code does and how to use it, but not for things like commenting out blocks of code.
According to Guido, multiline comments in Python are just contiguous single-line comments (search for "block comments").
To comment blocks of code, I sometimes use the following pattern:
if False:
# A bunch of code
This likely goes back to the core concept that there should be one obvious way to do a task. Additional comment styles add unnecessary complications and could decrease readability.
Well, the triple-quotes are used as multiline comments in docstrings. And # comments are used as inline comments and people get use to it.
Most of script languages don't have multiline comments either. Maybe that's the cause?
See PEP 0008, section Comments
And see if your Python editor offers some keyboard shortcut for block commenting. Emacs supports it, as well as Eclipse, presumably most of decent IDEs does.
From The Zen of Python:
There should be one-- and preferably only one --obvious way to do it.
To comment out a block of code in the Pycharm IDE:
Code | Comment with Line Comment
Windows or Linux: Ctrl + /
Mac OS: Command + /
Personally my comment style in say Java is like
/*
* My multi-line comment in Java
*/
So having single-line only comments isn't such a bad thing if your style is typical to the preceding example because in comparison you'd have
#
# My multi-line comment in Python
#
VB.NET is also a language with single-line only commenting, and personally I find it annoying as comments end up looking less likes comments and more like some kind of quote
'
' This is a VB.NET example
'
Single-line-only comments end up having less character-usage than multi-line comments, and are less likely to be escaped by some dodgy characters in a regex statement perhaps? I'd tend to agree with Ned though.
# This
# is
# a
# multi-line
# comment
Use comment block or search and replace (s/^/#/g) in your editor to achieve this.
I solved this by downloading a macro for my text editor (TextPad) that lets me highlight lines and it then inserts # at the first of each line. A similar macro removes the #'s. Some may ask why multiline is necessary but it comes in handy when you're trying to "turn off" a block of code for debugging purposes.
For anyone else looking for multi-line comments in Python - using the triple quote format can have some problematic consequences, as I've just learned the hard way. Consider this:
this_dict = {
'name': 'Bob',
"""
This is a multiline comment in the middle of a dictionary
"""
'species': 'Cat'
}
The multi-line comment will be tucked into the next string, messing up the
'species' key. Better to just use # for comments.
There should only be one way to do a thing, is contradicted by the usage of multiline strings and single line strings or switch/case and if, different form of loops.
Multiline comments are a pretty common feature and lets face it the multiline string comment is a hack with negative sideffects!
I have seen lots of code doing the multiline comment trick and even editors use it.
But I guess every language has its quirks where the devs insist on never fixing it. I know such quirks from the java side as well, which have been open since the late 90s, never to be fixed!
Because the # convention is a common one, and there really isn't anything you can do with a multiline comment that you can't with a #-sign comment. It's a historical accident, like the ancestry of /* ... */ comments going back to PL/I,
Assume that they were just considered unnecessary. Since it's so easy to just type #a comment, multiline comments can just consist of many single line comments.
For HTML, on the other hand, there's more of a need for multiliners. It's harder to keep typing <!--comments like this-->.
This is just a guess .. but
Because they are strings, they have some semantic value (the compiler doesn't get rid of them), therefore it makes sense for them to be used as docstrings. They actually become part of the AST, so extracting documentation becomes easier.
Besides, multiline comments are a bitch. Sorry to say, but regardless of the language, I don't use them for anything else than debugging purposes. Say you have code like this:
void someFunction()
{
Something
/*Some comments*/
Something else
}
Then you find out that there is something in your code you can't fix with the debugger, so you start manually debugging it by commenting out smaller and smaller chuncks of code with theese multiline comments. This would then give the function:
void someFunction()
{ /*
Something
/* Comments */
Something more*/
}
This is really irritating.
Multiline comments using IDLE on:
Mac OS X, after code selection, comment a block of code with Ctrl+3 and uncomment using Ctrl+4.
Windows, after code selection,
comment a block of code with Ctrl+Alt+3 and uncomment using Ctrl+At+4.
I remember reading about one guy who would put his multi-line comments into a triple-quoted variable:
x = '''
This is my
super-long mega-comment.
Wow there are a lot of lines
going on here!
'''
This does take up a bit of memory, but it gives you multi-line comment functionality, and plus most editors will highlight the syntax for you :)
It's also easy to comment out code by simply wrapping it with
x = '''
and
'''