Python 2.x gotchas and landmines [closed] - python
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
The purpose of my question is to strengthen my knowledge base with Python and get a better picture of it, which includes knowing its faults and surprises. To keep things specific, I'm only interested in the CPython interpreter.
I'm looking for something similar to what learned from my PHP landmines
question where some of the answers were well known to me but a couple were borderline horrifying.
Update:
Apparently one maybe two people are upset that I asked a question that's already partially answered outside of Stack Overflow. As some sort of compromise here's the URL
http://www.ferg.org/projects/python_gotchas.html
Note that one or two answers here already are original from what was written on the site referenced above.
Expressions in default arguments are calculated when the function is defined, not when it’s called.
Example: consider defaulting an argument to the current time:
>>>import time
>>> def report(when=time.time()):
... print when
...
>>> report()
1210294387.19
>>> time.sleep(5)
>>> report()
1210294387.19
The when argument doesn't change. It is evaluated when you define the function. It won't change until the application is re-started.
Strategy: you won't trip over this if you default arguments to None and then do something useful when you see it:
>>> def report(when=None):
... if when is None:
... when = time.time()
... print when
...
>>> report()
1210294762.29
>>> time.sleep(5)
>>> report()
1210294772.23
Exercise: to make sure you've understood: why is this happening?
>>> def spam(eggs=[]):
... eggs.append("spam")
... return eggs
...
>>> spam()
['spam']
>>> spam()
['spam', 'spam']
>>> spam()
['spam', 'spam', 'spam']
>>> spam()
['spam', 'spam', 'spam', 'spam']
You should be aware of how class variables are handled in Python. Consider the following class hierarchy:
class AAA(object):
x = 1
class BBB(AAA):
pass
class CCC(AAA):
pass
Now, check the output of the following code:
>>> print AAA.x, BBB.x, CCC.x
1 1 1
>>> BBB.x = 2
>>> print AAA.x, BBB.x, CCC.x
1 2 1
>>> AAA.x = 3
>>> print AAA.x, BBB.x, CCC.x
3 2 3
Surprised? You won't be if you remember that class variables are internally handled as dictionaries of a class object. For read operations, if a variable name is not found in the dictionary of current class, the parent classes are searched for it. So, the following code again, but with explanations:
# AAA: {'x': 1}, BBB: {}, CCC: {}
>>> print AAA.x, BBB.x, CCC.x
1 1 1
>>> BBB.x = 2
# AAA: {'x': 1}, BBB: {'x': 2}, CCC: {}
>>> print AAA.x, BBB.x, CCC.x
1 2 1
>>> AAA.x = 3
# AAA: {'x': 3}, BBB: {'x': 2}, CCC: {}
>>> print AAA.x, BBB.x, CCC.x
3 2 3
Same goes for handling class variables in class instances (treat this example as a continuation of the one above):
>>> a = AAA()
# a: {}, AAA: {'x': 3}
>>> print a.x, AAA.x
3 3
>>> a.x = 4
# a: {'x': 4}, AAA: {'x': 3}
>>> print a.x, AAA.x
4 3
Loops and lambdas (or any closure, really): variables are bound by name
funcs = []
for x in range(5):
funcs.append(lambda: x)
[f() for f in funcs]
# output:
# 4 4 4 4 4
A work around is either creating a separate function or passing the args by name:
funcs = []
for x in range(5):
funcs.append(lambda x=x: x)
[f() for f in funcs]
# output:
# 0 1 2 3 4
Dynamic binding makes typos in your variable names surprisingly hard to find. It's easy to spend half an hour fixing a trivial bug.
EDIT: an example...
for item in some_list:
... # lots of code
... # more code
for tiem in some_other_list:
process(item) # oops!
One of the biggest surprises I ever had with Python is this one:
a = ([42],)
a[0] += [43, 44]
This works as one might expect, except for raising a TypeError after updating the first entry of the tuple! So a will be ([42, 43, 44],) after executing the += statement, but there will be an exception anyway. If you try this on the other hand
a = ([42],)
b = a[0]
b += [43, 44]
you won't get an error.
try:
int("z")
except IndexError, ValueError:
pass
reason this doesn't work is because IndexError is the type of exception you're catching, and ValueError is the name of the variable you're assigning the exception to.
Correct code to catch multiple exceptions is:
try:
int("z")
except (IndexError, ValueError):
pass
There was a lot of discussion on hidden language features a while back: hidden-features-of-python. Where some pitfalls were mentioned (and some of the good stuff too).
Also you might want to check out Python Warts.
But for me, integer division's a gotcha:
>>> 5/2
2
You probably wanted:
>>> 5*1.0/2
2.5
If you really want this (C-like) behaviour, you should write:
>>> 5//2
2
As that will work with floats too (and it will work when you eventually go to Python 3):
>>> 5*1.0//2
2.0
GvR explains how integer division came to work how it does on the history of Python.
Not including an __init__.py in your packages. That one still gets me sometimes.
List slicing has caused me a lot of grief. I actually consider the following behavior a bug.
Define a list x
>>> x = [10, 20, 30, 40, 50]
Access index 2:
>>> x[2]
30
As you expect.
Slice the list from index 2 and to the end of the list:
>>> x[2:]
[30, 40, 50]
As you expect.
Access index 7:
>>> x[7]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
Again, as you expect.
However, try to slice the list from index 7 until the end of the list:
>>> x[7:]
[]
???
The remedy is to put a lot of tests when using list slicing. I wish I'd just get an error instead. Much easier to debug.
The only gotcha/surprise I've dealt with is with CPython's GIL. If for whatever reason you expect python threads in CPython to run concurrently... well they're not and this is pretty well documented by the Python crowd and even Guido himself.
A long but thorough explanation of CPython threading and some of the things going on under the hood and why true concurrency with CPython isn't possible.
http://jessenoller.com/2009/02/01/python-threads-and-the-global-interpreter-lock/
James Dumay eloquently reminded me of another Python gotcha:
Not all of Python's “included batteries” are wonderful.
James’ specific example was the HTTP libraries: httplib, urllib, urllib2, urlparse, mimetools, and ftplib. Some of the functionality is duplicated, and some of the functionality you'd expect is completely absent, e.g. redirect handling. Frankly, it's horrible.
If I ever have to grab something via HTTP these days, I use the urlgrabber module forked from the Yum project.
Floats are not printed at full precision by default (without repr):
x = 1.0 / 3
y = 0.333333333333
print x #: 0.333333333333
print y #: 0.333333333333
print x == y #: False
repr prints too many digits:
print repr(x) #: 0.33333333333333331
print repr(y) #: 0.33333333333300003
print x == 0.3333333333333333 #: True
Unintentionally mixing oldstyle and newstyle classes can cause seemingly mysterious errors.
Say you have a simple class hierarchy consisting of superclass A and subclass B. When B is instantiated, A's constructor must be called first. The code below correctly does this:
class A(object):
def __init__(self):
self.a = 1
class B(A):
def __init__(self):
super(B, self).__init__()
self.b = 1
b = B()
But if you forget to make A a newstyle class and define it like this:
class A:
def __init__(self):
self.a = 1
you get this traceback:
Traceback (most recent call last):
File "AB.py", line 11, in <module>
b = B()
File "AB.py", line 7, in __init__
super(B, self).__init__()
TypeError: super() argument 1 must be type, not classobj
Two other questions relating to this issue are 489269 and 770134
def f():
x += 1
x = 42
f()
results in an UnboundLocalError, because local names are detected statically. A different example would be
def f():
print x
x = 43
x = 42
f()
You cannot use locals()['x'] = whatever to change local variable values as you might expect.
This works:
>>> x = 1
>>> x
1
>>> locals()['x'] = 2
>>> x
2
BUT:
>>> def test():
... x = 1
... print x
... locals()['x'] = 2
... print x # *** prints 1, not 2 ***
...
>>> test()
1
1
This actually burnt me in an answer here on SO, since I had tested it outside a function and got the change I wanted. Afterwards, I found it mentioned and contrasted to the case of globals() in "Dive Into Python." See example 8.12. (Though it does not note that the change via locals() will work at the top level as I show above.)
x += [...] is not the same as x = x + [...] when x is a list`
>>> x = y = [1,2,3]
>>> x = x + [4]
>>> x == y
False
>>> x = y = [1,2,3]
>>> x += [4]
>>> x == y
True
One creates a new list while the other modifies in place
List repetition with nested lists
This caught me out today and wasted an hour of my time debugging:
>>> x = [[]]*5
>>> x[0].append(0)
# Expect x equals [[0], [], [], [], []]
>>> x
[[0], [0], [0], [0], [0]] # Oh dear
Explanation: Python list problem
Using class variables when you want instance variables. Most of the time this doesn't cause problems, but if it's a mutable value it causes surprises.
class Foo(object):
x = {}
But:
>>> f1 = Foo()
>>> f2 = Foo()
>>> f1.x['a'] = 'b'
>>> f2.x
{'a': 'b'}
You almost always want instance variables, which require you to assign inside __init__:
class Foo(object):
def __init__(self):
self.x = {}
Python 2 has some surprising behaviour with comparisons:
>>> print x
0
>>> print y
1
>>> x < y
False
What's going on? repr() to the rescue:
>>> print "x: %r, y: %r" % (x, y)
x: '0', y: 1
If you assign to a variable inside a function, Python assumes that the variable is defined inside that function:
>>> x = 1
>>> def increase_x():
... x += 1
...
>>> increase_x()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in increase_x
UnboundLocalError: local variable 'x' referenced before assignment
Use global x (or nonlocal x in Python 3) to declare you want to set a variable defined outside your function.
The values of range(end_val) are not only strictly smaller than end_val, but strictly smaller than int(end_val). For a float argument to range, this might be an unexpected result:
from future.builtins import range
list(range(2.89))
[0, 1]
Due to 'truthiness' this makes sense:
>>>bool(1)
True
but you might not expect it to go the other way:
>>>float(True)
1.0
This can be a gotcha if you're converting strings to numeric and your data has True/False values.
If you create a list of list this way:
arr = [[2]] * 5
print arr
[[2], [2], [2], [2], [2]]
Then this creates an array with all elements pointing to the same object ! This might create a real confusion. Consider this:
arr[0][0] = 5
then if you print arr
print arr
[[5], [5], [5], [5], [5]]
The proper way of initializing the array is for example with a list comprehension:
arr = [[2] for _ in range(5)]
arr[0][0] = 5
print arr
[[5], [2], [2], [2], [2]]
Related
Class variable vs instance variable
While learning python through python docs, i came across the following wherein its explained that class variable is common to the class and that any object can change it: Sample Code 1: class Dog: tricks = [] # mistaken use of a class variable def __init__(self, name): self.name = name def add_trick(self, trick): self.tricks.append(trick) Output: >>> d = Dog('Fido') >>> e = Dog('Buddy') >>> d.add_trick('roll over') >>> e.add_trick('play dead') >>> d.tricks # unexpectedly shared by all dogs ['roll over', 'play dead'] Question => If so, then why doesn't y in the following example get affected when x changes its tricks attribute to 5? Sample Code 2: class Complex: tricks = 3 def __init__(self,var1): self.tricks=var1 def add_tricks(self,var1): self.tricks=var1 x = Complex(11) y = Complex(12) print (x.tricks) print (y.tricks) x.add_tricks(5) print (x.tricks) print (y.tricks) -->Remains unchanged Output: 11 12 5 12 -->Remains unchanged And what exactly is the difference when i remove the self in the following program: Sample Code 3: class Complex: tricks = 3 def __init__(self,var1): self.tricks=var1 def add_tricks(self,var1): tricks=var1 x = Complex(11) y = Complex(12) print (x.tricks) print (y.tricks) x.add_tricks(5) -->This change is not reflected anywhere print (x.tricks) print (y.tricks) print(Complex.tricks) Output: 11 12 11 12 3
This example may be illustrative. Given the following class (I've dropped the initialiser from your example because it doesn't let us demonstrate the behaviour): class Complex: tricks = 3 def add_tricks(self, value): self.tricks = value We can see, upon creation, the value of their tricks attribute is both 3: >>> a = Complex() >>> b = Complex() >>> >>> a.tricks 3 >>> b.tricks 3 Let's take a second and look at the names defined on those objects: >>> a.__dict__ {} >>> b.__dict__ {} They're both objects with no attributes themselves. Let's see what happens after we call add_tricks on b: >>> b.add_tricks(5) >>> >>> a.tricks 3 >>> b.tricks 5 Okay. So, this looks like the shared value hasn't been affected. Let's take a look at their names again: >>> a.__dict__ {} >>> b.__dict__ {'tricks': 5} And there it is. Assigning to self.tricks creates an attribute local to that object with name tricks, which when accessed via the object (or self) is the one that we'll use from that point forward. The shared value is still there and unchanged: >>> a.__class__.tricks 3 >>> b.__class__.tricks 3 It's just on the class, not on the object.
How comparator works for objects that are not comparable in python?
I have defined a list as below: list = [1,3,2,[4,5,6]] then defined a comparator method as below: def reverseCom(x,y): if(x>y): return -1 elif(x<y): return 1 else: return 0 Now I have sorted the list using reverseCom: list.sort(reverseCom) print list Result : [[4, 5, 6], 3, 2, 1] Though the element [4, 5, 6] is not comparable with other elements of the list. How its not throwing any error ? Do you can help me to understand that how sort works with the user defined comparator in python ?
This is a Python 2 quirk. In Python 2, numeric and non numeric values are comparable, and numeric values are always considered to be less than the value of container objects: >>> 1 < [1] True >>> 1 < [2] True >>> 1558 < [1] True >>> 1 < {} True when comparing two containers values of different types, on the other hand, it is the name of their type that is taken into consideration: >>> () < [] False >>> 'tuple' < 'list' False >>> {} < [] True >>> 'dict' < 'list' True This feature, however, has been dropped in Python 3, which made numeric and non-numeric values no longer comparable: >>> 1 < [1] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: int() < list() EDIT: this next explanation is fully experimentation-based, and I couldn't find sound documentation to back it up. If any one does find it, I'd be glad to read through it. It appears Python 2 has even more rules when it comes to comparison of user-defined objects/non-container objects. In this case it appears that numeric values are always greater than non-numeric non-container values. >>> class A: pass ... >>> a = A() >>> 1 > a True >>> 2.7 > a True Now, when comparing two objects of different, non-numeric, non-container types, it seems that it is their address that is taken into account: >>> class A: pass ... >>> class B: pass ... >>> a = A() >>> a <__main__.A instance at 0x0000000002265348> >>> b = B() >>> b <__main__.B instance at 0x0000000002265048> >>> a < b False >>> b < a True Which is really bananas, if you ask me. Of course, all that can be changed around if you care to override the __lt__() and __gt__() methods inside your class definition, which determine the standard behavior of the < and > operators. Further documentation on how these methods operate can be found here. Bottomline: avoid comparison between different types as much as you can. The result is really unpredictable, unintuitive and not all that well documented. Also, use Python 3 whenever possible.
Your comparator actually works, i.e., does not throw any error: In [9]: reverseCom([4,5,6],1) Out[9]: -1 In [10]: reverseCom([4,5,6],2) Out[10]: -1 In [11]: reverseCom([4,5,6],3) Out[11]: -1 The reason why it works is, list instances always bigger than int instances: In [12]: [1,2,3] > 5 Out[12]: True In [13]: ['hello'] > 5 Out[13]: True In [14]: [] > -1 Out[14]: True
Python closures using lambda
I saw this below piece of code in a tutorial and wondering how it works. Generally, the lambda takes a input and returns something but here it does not take anything and still it works. >>> for i in range(3): ... a.append(lambda:i) ... >>> a [<function <lambda> at 0x028930B0>, <function <lambda> at 0x02893030>, <function <lambda> at 0x028930F0>]
lambda:i defines the constant function that returns i. Try this: >>> f = lambda:3 >>> f() You get the value 3. But there's something more going on. Try this: >>> a = 4 >>> g = lambda:a >>> g() gives you 4. But after a = 5, g() returns 5. Python functions "remember" the environment in which they're executed. This environment is called a "closure". By modifying the data in the closure (e.g. the variable a in the second example) you can change the behavior of the functions defined in that closure.
In this case a is a list of function objects defined in the loop. Each of which will return 2. >>> a[0]() 2 To make these function objects remember i values sequentially you should rewrite the code to >>> for i in range(3): ... a.append(lambda x=i:x) ... that will give you >>> a[0]() 0 >>> a[1]() 1 >>> a[2]() 2 but in this case you get side effect that allows you to not to use remembered value >>> a[0](42) 42
I'm not sure what you mean by "it works". It appears that it doesn't work at all. In the case you have presented, i is a global variable. It changes every time the loop iterates, so after the loop, i == 2. Now, since each lambda function simply says lambda:i each function call will simply return the most recent value of i. For example: >>> a = [] >>> for i in range(3): a.append(lambda:1) >>> print a[0]() 2 >>> print a[1]() 2 >>> print a[2]() In other words, this does not likely do what you expect it to do.
lambda defines an anonymous inline function. These functions are limited compared to the full functions you can define with def - they can't do assignments, and they just return a result. However, you can run into interesting issues with them, as defining an ordinary function inside a loop is not common, but lambda functions are often put into loops. This can create closure issues. The following: >>> a = [] >>> for i in range(3): ... a.append(lambda:i) adds three functions (which are first-class objects in Python) to a. These functions return the value of i. However, they use the definition of i as it existed at the end of the loop. Therefore, you can call any of these functions: >>> a[0]() 2 >>> a[1]() 2 >>> a[2]() 2 and they will each return 2, the last iteration of the range object. If you want each to return a different number, use a default argument: >>> for i in range(3): ... a.append(lambda i=i:i) This will forcibly give each function an i as it was at that specific point during execution. >>> a[0]() 0 >>> a[1]() 1 >>> a[2]() 2 Of course, since we're now able to pass an argument to that function, we can do this: >>> b[0](5) 5 >>> b[0](range(3)) range(0, 3) It all depends on what you're planning to do with it.
Python for loop initialization of reference to function
I have this sample code where I expected it to print the current test.x variable, but when I use a for loop for defining a list of function references I am not getting what I'm expecting([1,1] and [0,0]). I do get what I'm expecting when I use the commented lines instead ([0,1] and [1,0]). I realize that there are easier ways to do this but for my program I need it to be the way it is but I want to define the rules object in a for loop instead of defining each element on a line because I don't know how large the rules object will be. Thanks for any help (Python 2.7) class TestClass: def __init__(self): self.x = list([0, 1]) def get_value(self, i): return self.x[i] test = TestClass() rules = list([None, None]) for a in range(2): rules[a] = lambda t: test.get_value(a) #rules[0] = lambda t: test.get_value(0) #rules[1] = lambda t: test.get_value(1) print(rules[0](0), rules[1](0)) test.x[0] = 1 test.x[1] = 0 print(rules[0](0), rules[1](0))
The problem can be shown more concisely as follows: >>> rules = list([None, None]) >>> for a in range(2): ... rules[a] = lambda t: a ... >>> rules[0](0) 1 >>> rules[0](1) 1 >>> rules[1](0) 1 >>> rules[1](1) 1 I think the problem is that the code always reflects the final value of a. This is known as "late-binding closures" and is discussed in the Python Guide here. One (rather ugly) way of getting round this is to create the new function each time by partially applying a function using the functools package. This "captures" the current value of a. >>> from functools import partial >>> for a in range(2): ... def get(t,x): return x ... rules[a] = partial(get,x=a) ... >>> rules[0](0) 0 >>> rules[0](1) 0 >>> rules[1](0) 1 >>> rules[1](1) 1 A simpler way of achieving the same effect: >>> for a in range(2): ... rules[a] = lambda t,a=a: a As shown in the linked Python Guide, you can also use a list comprehension to simplify the code a little: rules = [lambda t,a=a: a for a in range(2)]
Hidden features of Python [closed]
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 11 years ago. Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions. What are the lesser-known but useful features of the Python programming language? Try to limit answers to Python core. One feature per answer. Give an example and short description of the feature, not just a link to documentation. Label the feature using a title as the first line. Quick links to answers: Argument Unpacking Braces Chaining Comparison Operators Decorators Default Argument Gotchas / Dangers of Mutable Default arguments Descriptors Dictionary default .get value Docstring Tests Ellipsis Slicing Syntax Enumeration For/else Function as iter() argument Generator expressions import this In Place Value Swapping List stepping __missing__ items Multi-line Regex Named string formatting Nested list/generator comprehensions New types at runtime .pth files ROT13 Encoding Regex Debugging Sending to Generators Tab Completion in Interactive Interpreter Ternary Expression try/except/else Unpacking+print() function with statement
Chaining comparison operators: >>> x = 5 >>> 1 < x < 10 True >>> 10 < x < 20 False >>> x < 10 < x*10 < 100 True >>> 10 > x <= 9 True >>> 5 == x > 4 True In case you're thinking it's doing 1 < x, which comes out as True, and then comparing True < 10, which is also True, then no, that's really not what happens (see the last example.) It's really translating into 1 < x and x < 10, and x < 10 and 10 < x * 10 and x*10 < 100, but with less typing and each term is only evaluated once.
Get the python regex parse tree to debug your regex. Regular expressions are a great feature of python, but debugging them can be a pain, and it's all too easy to get a regex wrong. Fortunately, python can print the regex parse tree, by passing the undocumented, experimental, hidden flag re.DEBUG (actually, 128) to re.compile. >>> re.compile("^\[font(?:=(?P<size>[-+][0-9]{1,2}))?\](.*?)[/font]", re.DEBUG) at at_beginning literal 91 literal 102 literal 111 literal 110 literal 116 max_repeat 0 1 subpattern None literal 61 subpattern 1 in literal 45 literal 43 max_repeat 1 2 in range (48, 57) literal 93 subpattern 2 min_repeat 0 65535 any None in literal 47 literal 102 literal 111 literal 110 literal 116 Once you understand the syntax, you can spot your errors. There we can see that I forgot to escape the [] in [/font]. Of course you can combine it with whatever flags you want, like commented regexes: >>> re.compile(""" ^ # start of a line \[font # the font tag (?:=(?P<size> # optional [font=+size] [-+][0-9]{1,2} # size specification ))? \] # end of tag (.*?) # text between the tags \[/font\] # end of the tag """, re.DEBUG|re.VERBOSE|re.DOTALL)
enumerate Wrap an iterable with enumerate and it will yield the item along with its index. For example: >>> a = ['a', 'b', 'c', 'd', 'e'] >>> for index, item in enumerate(a): print index, item ... 0 a 1 b 2 c 3 d 4 e >>> References: Python tutorial—looping techniques Python docs—built-in functions—enumerate PEP 279
Creating generators objects If you write x=(n for n in foo if bar(n)) you can get out the generator and assign it to x. Now it means you can do for n in x: The advantage of this is that you don't need intermediate storage, which you would need if you did x = [n for n in foo if bar(n)] In some cases this can lead to significant speed up. You can append many if statements to the end of the generator, basically replicating nested for loops: >>> n = ((a,b) for a in range(0,2) for b in range(4,6)) >>> for i in n: ... print i (0, 4) (0, 5) (1, 4) (1, 5)
iter() can take a callable argument For instance: def seek_next_line(f): for c in iter(lambda: f.read(1),'\n'): pass The iter(callable, until_value) function repeatedly calls callable and yields its result until until_value is returned.
Be careful with mutable default arguments >>> def foo(x=[]): ... x.append(1) ... print x ... >>> foo() [1] >>> foo() [1, 1] >>> foo() [1, 1, 1] Instead, you should use a sentinel value denoting "not given" and replace with the mutable you'd like as default: >>> def foo(x=None): ... if x is None: ... x = [] ... x.append(1) ... print x >>> foo() [1] >>> foo() [1]
Sending values into generator functions. For example having this function: def mygen(): """Yield 5 until something else is passed back via send()""" a = 5 while True: f = (yield a) #yield a and possibly get f in return if f is not None: a = f #store the new value You can: >>> g = mygen() >>> g.next() 5 >>> g.next() 5 >>> g.send(7) #we send this back to the generator 7 >>> g.next() #now it will yield 7 until we send something else 7
If you don't like using whitespace to denote scopes, you can use the C-style {} by issuing: from __future__ import braces
The step argument in slice operators. For example: a = [1,2,3,4,5] >>> a[::2] # iterate over the whole list in 2-increments [1,3,5] The special case x[::-1] is a useful idiom for 'x reversed'. >>> a[::-1] [5,4,3,2,1]
Decorators Decorators allow to wrap a function or method in another function that can add functionality, modify arguments or results, etc. You write decorators one line above the function definition, beginning with an "at" sign (#). Example shows a print_args decorator that prints the decorated function's arguments before calling it: >>> def print_args(function): >>> def wrapper(*args, **kwargs): >>> print 'Arguments:', args, kwargs >>> return function(*args, **kwargs) >>> return wrapper >>> #print_args >>> def write(text): >>> print text >>> write('foo') Arguments: ('foo',) {} foo
The for...else syntax (see http://docs.python.org/ref/for.html ) for i in foo: if i == 0: break else: print("i was never 0") The "else" block will be normally executed at the end of the for loop, unless the break is called. The above code could be emulated as follows: found = False for i in foo: if i == 0: found = True break if not found: print("i was never 0")
From 2.5 onwards dicts have a special method __missing__ that is invoked for missing items: >>> class MyDict(dict): ... def __missing__(self, key): ... self[key] = rv = [] ... return rv ... >>> m = MyDict() >>> m["foo"].append(1) >>> m["foo"].append(2) >>> dict(m) {'foo': [1, 2]} There is also a dict subclass in collections called defaultdict that does pretty much the same but calls a function without arguments for not existing items: >>> from collections import defaultdict >>> m = defaultdict(list) >>> m["foo"].append(1) >>> m["foo"].append(2) >>> dict(m) {'foo': [1, 2]} I recommend converting such dicts to regular dicts before passing them to functions that don't expect such subclasses. A lot of code uses d[a_key] and catches KeyErrors to check if an item exists which would add a new item to the dict.
In-place value swapping >>> a = 10 >>> b = 5 >>> a, b (10, 5) >>> a, b = b, a >>> a, b (5, 10) The right-hand side of the assignment is an expression that creates a new tuple. The left-hand side of the assignment immediately unpacks that (unreferenced) tuple to the names a and b. After the assignment, the new tuple is unreferenced and marked for garbage collection, and the values bound to a and b have been swapped. As noted in the Python tutorial section on data structures, Note that multiple assignment is really just a combination of tuple packing and sequence unpacking.
Readable regular expressions In Python you can split a regular expression over multiple lines, name your matches and insert comments. Example verbose syntax (from Dive into Python): >>> pattern = """ ... ^ # beginning of string ... M{0,4} # thousands - 0 to 4 M's ... (CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's), ... # or 500-800 (D, followed by 0 to 3 C's) ... (XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's), ... # or 50-80 (L, followed by 0 to 3 X's) ... (IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's), ... # or 5-8 (V, followed by 0 to 3 I's) ... $ # end of string ... """ >>> re.search(pattern, 'M', re.VERBOSE) Example naming matches (from Regular Expression HOWTO) >>> p = re.compile(r'(?P<word>\b\w+\b)') >>> m = p.search( '(((( Lots of punctuation )))' ) >>> m.group('word') 'Lots' You can also verbosely write a regex without using re.VERBOSE thanks to string literal concatenation. >>> pattern = ( ... "^" # beginning of string ... "M{0,4}" # thousands - 0 to 4 M's ... "(CM|CD|D?C{0,3})" # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's), ... # or 500-800 (D, followed by 0 to 3 C's) ... "(XC|XL|L?X{0,3})" # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's), ... # or 50-80 (L, followed by 0 to 3 X's) ... "(IX|IV|V?I{0,3})" # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's), ... # or 5-8 (V, followed by 0 to 3 I's) ... "$" # end of string ... ) >>> print pattern "^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$"
Function argument unpacking You can unpack a list or a dictionary as function arguments using * and **. For example: def draw_point(x, y): # do some magic point_foo = (3, 4) point_bar = {'y': 3, 'x': 2} draw_point(*point_foo) draw_point(**point_bar) Very useful shortcut since lists, tuples and dicts are widely used as containers.
ROT13 is a valid encoding for source code, when you use the right coding declaration at the top of the code file: #!/usr/bin/env python # -*- coding: rot13 -*- cevag "Uryyb fgnpxbiresybj!".rapbqr("rot13")
Creating new types in a fully dynamic manner >>> NewType = type("NewType", (object,), {"x": "hello"}) >>> n = NewType() >>> n.x "hello" which is exactly the same as >>> class NewType(object): >>> x = "hello" >>> n = NewType() >>> n.x "hello" Probably not the most useful thing, but nice to know. Edit: Fixed name of new type, should be NewType to be the exact same thing as with class statement. Edit: Adjusted the title to more accurately describe the feature.
Context managers and the "with" Statement Introduced in PEP 343, a context manager is an object that acts as a run-time context for a suite of statements. Since the feature makes use of new keywords, it is introduced gradually: it is available in Python 2.5 via the __future__ directive. Python 2.6 and above (including Python 3) has it available by default. I have used the "with" statement a lot because I think it's a very useful construct, here is a quick demo: from __future__ import with_statement with open('foo.txt', 'w') as f: f.write('hello!') What's happening here behind the scenes, is that the "with" statement calls the special __enter__ and __exit__ methods on the file object. Exception details are also passed to __exit__ if any exception was raised from the with statement body, allowing for exception handling to happen there. What this does for you in this particular case is that it guarantees that the file is closed when execution falls out of scope of the with suite, regardless if that occurs normally or whether an exception was thrown. It is basically a way of abstracting away common exception-handling code. Other common use cases for this include locking with threads and database transactions.
Dictionaries have a get() method Dictionaries have a 'get()' method. If you do d['key'] and key isn't there, you get an exception. If you do d.get('key'), you get back None if 'key' isn't there. You can add a second argument to get that item back instead of None, eg: d.get('key', 0). It's great for things like adding up numbers: sum[value] = sum.get(value, 0) + 1
Descriptors They're the magic behind a whole bunch of core Python features. When you use dotted access to look up a member (eg, x.y), Python first looks for the member in the instance dictionary. If it's not found, it looks for it in the class dictionary. If it finds it in the class dictionary, and the object implements the descriptor protocol, instead of just returning it, Python executes it. A descriptor is any class that implements the __get__, __set__, or __delete__ methods. Here's how you'd implement your own (read-only) version of property using descriptors: class Property(object): def __init__(self, fget): self.fget = fget def __get__(self, obj, type): if obj is None: return self return self.fget(obj) and you'd use it just like the built-in property(): class MyClass(object): #Property def foo(self): return "Foo!" Descriptors are used in Python to implement properties, bound methods, static methods, class methods and slots, amongst other things. Understanding them makes it easy to see why a lot of things that previously looked like Python 'quirks' are the way they are. Raymond Hettinger has an excellent tutorial that does a much better job of describing them than I do.
Conditional Assignment x = 3 if (y == 1) else 2 It does exactly what it sounds like: "assign 3 to x if y is 1, otherwise assign 2 to x". Note that the parens are not necessary, but I like them for readability. You can also chain it if you have something more complicated: x = 3 if (y == 1) else 2 if (y == -1) else 1 Though at a certain point, it goes a little too far. Note that you can use if ... else in any expression. For example: (func1 if y == 1 else func2)(arg1, arg2) Here func1 will be called if y is 1 and func2, otherwise. In both cases the corresponding function will be called with arguments arg1 and arg2. Analogously, the following is also valid: x = (class1 if y == 1 else class2)(arg1, arg2) where class1 and class2 are two classes.
Doctest: documentation and unit-testing at the same time. Example extracted from the Python documentation: def factorial(n): """Return the factorial of n, an exact integer >= 0. If the result is small enough to fit in an int, return an int. Else return a long. >>> [factorial(n) for n in range(6)] [1, 1, 2, 6, 24, 120] >>> factorial(-1) Traceback (most recent call last): ... ValueError: n must be >= 0 Factorials of floats are OK, but the float must be an exact integer: """ import math if not n >= 0: raise ValueError("n must be >= 0") if math.floor(n) != n: raise ValueError("n must be exact integer") if n+1 == n: # catch a value like 1e300 raise OverflowError("n too large") result = 1 factor = 2 while factor <= n: result *= factor factor += 1 return result def _test(): import doctest doctest.testmod() if __name__ == "__main__": _test()
Named formatting % -formatting takes a dictionary (also applies %i/%s etc. validation). >>> print "The %(foo)s is %(bar)i." % {'foo': 'answer', 'bar':42} The answer is 42. >>> foo, bar = 'question', 123 >>> print "The %(foo)s is %(bar)i." % locals() The question is 123. And since locals() is also a dictionary, you can simply pass that as a dict and have % -substitions from your local variables. I think this is frowned upon, but simplifies things.. New Style Formatting >>> print("The {foo} is {bar}".format(foo='answer', bar=42))
To add more python modules (espcially 3rd party ones), most people seem to use PYTHONPATH environment variables or they add symlinks or directories in their site-packages directories. Another way, is to use *.pth files. Here's the official python doc's explanation: "The most convenient way [to modify python's search path] is to add a path configuration file to a directory that's already on Python's path, usually to the .../site-packages/ directory. Path configuration files have an extension of .pth, and each line must contain a single path that will be appended to sys.path. (Because the new paths are appended to sys.path, modules in the added directories will not override standard modules. This means you can't use this mechanism for installing fixed versions of standard modules.)"
Exception else clause: try: put_4000000000_volts_through_it(parrot) except Voom: print "'E's pining!" else: print "This parrot is no more!" finally: end_sketch() The use of the else clause is better than adding additional code to the try clause because it avoids accidentally catching an exception that wasn’t raised by the code being protected by the try ... except statement. See http://docs.python.org/tut/node10.html
Re-raising exceptions: # Python 2 syntax try: some_operation() except SomeError, e: if is_fatal(e): raise handle_nonfatal(e) # Python 3 syntax try: some_operation() except SomeError as e: if is_fatal(e): raise handle_nonfatal(e) The 'raise' statement with no arguments inside an error handler tells Python to re-raise the exception with the original traceback intact, allowing you to say "oh, sorry, sorry, I didn't mean to catch that, sorry, sorry." If you wish to print, store or fiddle with the original traceback, you can get it with sys.exc_info(), and printing it like Python would is done with the 'traceback' module.
Main messages :) import this # btw look at this module's source :) De-cyphered: The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than right now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those!
Interactive Interpreter Tab Completion try: import readline except ImportError: print "Unable to load readline module." else: import rlcompleter readline.parse_and_bind("tab: complete") >>> class myclass: ... def function(self): ... print "my function" ... >>> class_instance = myclass() >>> class_instance.<TAB> class_instance.__class__ class_instance.__module__ class_instance.__doc__ class_instance.function >>> class_instance.f<TAB>unction() You will also have to set a PYTHONSTARTUP environment variable.
Nested list comprehensions and generator expressions: [(i,j) for i in range(3) for j in range(i) ] ((i,j) for i in range(4) for j in range(i) ) These can replace huge chunks of nested-loop code.
Operator overloading for the set builtin: >>> a = set([1,2,3,4]) >>> b = set([3,4,5,6]) >>> a | b # Union {1, 2, 3, 4, 5, 6} >>> a & b # Intersection {3, 4} >>> a < b # Subset False >>> a - b # Difference {1, 2} >>> a ^ b # Symmetric Difference {1, 2, 5, 6} More detail from the standard library reference: Set Types