Related
Recently I'm writing a download program, which uses the HTTP Range field to download many blocks at the same time. I wrote a Python class to represent the Range (the HTTP header's Range is a closed interval):
class ClosedRange:
def __init__(self, begin, end):
self.begin = begin
self.end = end
def __iter__(self):
yield self.begin
yield self.end
def __str__(self):
return '[{0.begin}, {0.end}]'.format(self)
def __len__(self):
return self.end - self.begin + 1
The __iter__ magic method is to support the tuple unpacking:
header = {'Range': 'bytes={}-{}'.format(*the_range)}
And len(the_range) is how many bytes in that Range.
Now I found that 'bytes={}-{}'.format(*the_range) occasionally causes the MemoryError. After some debugging I found that the CPython interpreter will try to call len(iterable) when executing func(*iterable), and (may) allocate memory based on the length. On my machine, when len(the_range) is greater than 1GB, the MemoryError appears.
This is a simplified one:
class C:
def __iter__(self):
yield 5
def __len__(self):
print('__len__ called')
return 1024**3
def f(*args):
return args
>>> c = C()
>>> f(*c)
__len__ called
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError
>>> # BTW, `list(the_range)` have the same problem.
>>> list(c)
__len__ called
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError
So my questions are:
Why CPython call len(iterable)? From this question I see you won't know an iterator's length until you iterate throw it. Is this an optimization?
Can __len__ method return the 'fake' length (i.e. not the real number of elements in memory) of an object?
Why CPython call len(iterable)? From this question I see you won't know an iterator's length until you iterate throw it. Is this an optimization?
when python (assuming python3) execute f(*c), opcode CALL_FUNCTION_EX is used:
0 LOAD_GLOBAL 0 (f)
2 LOAD_GLOBAL 1 (c)
4 CALL_FUNCTION_EX 0
6 POP_TOP
as c is an iterable, PySequence_Tuple is called to convert it to a tuple, then PyObject_LengthHint is called to determine the new tuple length, as __len__ method is defined on c, it gets called and its return value is used to allocate memory for a new tuple, as malloc failed, finally MemoryError error gets raised.
/* Guess result size and allocate space. */
n = PyObject_LengthHint(v, 10);
if (n == -1)
goto Fail;
result = PyTuple_New(n);
Can __len__ method return the 'fake' length (i.e. not the real number of elements in memory) of an object?
in this scenario, yes.
when the return value of __len__ is smaller than need, python will adjust memory space of new tuple object to fit when filling the tuple. if it is larger than need, although python will allocate extra memory, _PyTuple_Resize will be called in the end to reclaim over-allocated space.
I'd like to point to a function that does nothing:
def identity(*args)
return args
my use case is something like this
try:
gettext.find(...)
...
_ = gettext.gettext
else:
_ = identity
Of course, I could use the identity defined above, but a built-in would certainly run faster (and avoid bugs introduced by my own).
Apparently, map and filter use None for the identity, but this is specific to their implementations.
>>> _=None
>>> _("hello")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not callable
Doing some more research, there is none, a feature was asked in issue 1673203 And from Raymond Hettinger said there won't be:
Better to let people write their own trivial pass-throughs
and think about the signature and time costs.
So a better way to do it is actually (a lambda avoids naming the function):
_ = lambda *args: args
advantage: takes any number of parameters
disadvantage: the result is a boxed version of the parameters
OR
_ = lambda x: x
advantage: doesn't change the type of the parameter
disadvantage: takes exactly 1 positional parameter
An identity function, as defined in https://en.wikipedia.org/wiki/Identity_function, takes a single argument and returns it unchanged:
def identity(x):
return x
What you are asking for when you say you want the signature def identity(*args) is not strictly an identity function, as you want it to take multiple arguments. That's fine, but then you hit a problem as Python functions don't return multiple results, so you have to find a way of cramming all of those arguments into one return value.
The usual way of returning "multiple values" in Python is to return a tuple of the values - technically that's one return value but it can be used in most contexts as if it were multiple values. But doing that here means you get
>>> def mv_identity(*args):
... return args
...
>>> mv_identity(1,2,3)
(1, 2, 3)
>>> # So far, so good. But what happens now with single arguments?
>>> mv_identity(1)
(1,)
And fixing that problem quickly gives other issues, as the various answers here have shown.
So, in summary, there's no identity function defined in Python because:
The formal definition (a single argument function) isn't that useful, and is trivial to write.
Extending the definition to multiple arguments is not well-defined in general, and you're far better off defining your own version that works the way you need it to for your particular situation.
For your precise case,
def dummy_gettext(message):
return message
is almost certainly what you want - a function that has the same calling convention and return as gettext.gettext, which returns its argument unchanged, and is clearly named to describe what it does and where it's intended to be used. I'd be pretty shocked if performance were a crucial consideration here.
yours will work fine. When the number of parameters is fix you can use an anonymous function like this:
lambda x: x
There is no a built-in identity function in Python. An imitation of the Haskell's id function would be:
identity = lambda x, *args: (x,) + args if args else x
Example usage:
identity(1)
1
identity(1,2)
(1, 2)
Since identity does nothing except returning the given arguments, I do not think that it is slower than a native implementation would be.
No, there isn't.
Note that your identity:
is equivalent to lambda *args: args
Will box its args - i.e.
In [6]: id = lambda *args: args
In [7]: id(3)
Out[7]: (3,)
So, you may want to use lambda arg: arg if you want a true identity function.
NB: This example will shadow the built-in id function (which you will probably never use).
If the speed does not matter, this should handle all cases:
def identity(*args, **kwargs):
if not args:
if not kwargs:
return None
elif len(kwargs) == 1:
return next(iter(kwargs.values()))
else:
return (*kwargs.values(),)
elif not kwargs:
if len(args) == 1:
return args[0]
else:
return args
else:
return (*args, *kwargs.values())
Examples of usage:
print(identity())
None
$identity(1)
1
$ identity(1, 2)
(1, 2)
$ identity(1, b=2)
(1, 2)
$ identity(a=1, b=2)
(1, 2)
$ identity(1, 2, c=3)
(1, 2, 3)
Stub of a single-argument function
gettext.gettext (the OP's example use case) accepts a single argument, message. If one needs a stub for it, there's no reason to return [message] instead of message (def identity(*args): return args). Thus both
_ = lambda message: message
def _(message):
return message
fit perfectly.
...but a built-in would certainly run faster (and avoid bugs introduced by my own).
Bugs in such a trivial case are barely relevant. For an argument of predefined type, say str, we can use str() itself as an identity function (because of string interning it even retains object identity, see id note below) and compare its performance with the lambda solution:
$ python3 -m timeit -s "f = lambda m: m" "f('foo')"
10000000 loops, best of 3: 0.0852 usec per loop
$ python3 -m timeit "str('foo')"
10000000 loops, best of 3: 0.107 usec per loop
A micro-optimisation is possible. For example, the following Cython code:
test.pyx
cpdef str f(str message):
return message
Then:
$ pip install runcython3
$ makecython3 test.pyx
$ python3 -m timeit -s "from test import f" "f('foo')"
10000000 loops, best of 3: 0.0317 usec per loop
Build-in object identity function
Don't confuse an identity function with the id built-in function which returns the 'identity' of an object (meaning a unique identifier for that particular object rather than that object's value, as compared with == operator), its memory address in CPython.
Lots of good answers and discussion are in this topic. I just want to note that, in OP's case where there is a single argument in the identity function, compile-wise it doesn't matter if you use a lambda or define a function (in which case you should probably define the function to stay PEP8 compliant). The bytecodes are functionally identical:
import dis
function_method = compile("def identity(x):\n return x\ny=identity(Type('x', (), dict()))", "foo", "exec")
dis.dis(function_method)
1 0 LOAD_CONST 0 (<code object identity at 0x7f52cc30b030, file "foo", line 1>)
2 LOAD_CONST 1 ('identity')
4 MAKE_FUNCTION 0
6 STORE_NAME 0 (identity)
3 8 LOAD_NAME 0 (identity)
10 LOAD_NAME 1 (Type)
12 LOAD_CONST 2 ('x')
14 LOAD_CONST 3 (())
16 LOAD_NAME 2 (dict)
18 CALL_FUNCTION 0
20 CALL_FUNCTION 3
22 CALL_FUNCTION 1
24 STORE_NAME 3 (y)
26 LOAD_CONST 4 (None)
28 RETURN_VALUE
Disassembly of <code object identity at 0x7f52cc30b030, file "foo", line 1>:
2 0 LOAD_FAST 0 (x)
2 RETURN_VALUE
And lambda
import dis
lambda_method = compile("identity = lambda x: x\ny=identity(Type('x', (), dict()))", "foo", "exec")
dis.dis(lambda_method)
1 0 LOAD_CONST 0 (<code object <lambda> at 0x7f52c9fbbd20, file "foo", line 1>)
2 LOAD_CONST 1 ('<lambda>')
4 MAKE_FUNCTION 0
6 STORE_NAME 0 (identity)
2 8 LOAD_NAME 0 (identity)
10 LOAD_NAME 1 (Type)
12 LOAD_CONST 2 ('x')
14 LOAD_CONST 3 (())
16 LOAD_NAME 2 (dict)
18 CALL_FUNCTION 0
20 CALL_FUNCTION 3
22 CALL_FUNCTION 1
24 STORE_NAME 3 (y)
26 LOAD_CONST 4 (None)
28 RETURN_VALUE
Disassembly of <code object <lambda> at 0x7f52c9fbbd20, file "foo", line 1>:
1 0 LOAD_FAST 0 (x)
2 RETURN_VALUE
Adding to all answers:
Notice there is an implicit convention in Python stdlib, where a HOF defaulting it's key parameter function to the identity function, interprets None as such.
E.g. sorted, heapq.merge, max, min, etc.
So, it is not bad idea to consider your HOF expecting key to following the same pattern.
That is, instead of:
def my_hof(x, key=lambda _: _):
...
(whis is totally right)
You could write:
def my_hof(x, key=None):
if key is None: key = lambda _: _
...
If you want.
The thread is pretty old. But still wanted to post this.
It is possible to build an identity method for both arguments and objects. In the example below, ObjOut is an identity for ObjIn. All other examples above haven't dealt with dict **kwargs.
class test(object):
def __init__(self,*args,**kwargs):
self.args = args
self.kwargs = kwargs
def identity (self):
return self
objIn=test('arg-1','arg-2','arg-3','arg-n',key1=1,key2=2,key3=3,keyn='n')
objOut=objIn.identity()
print('args=',objOut.args,'kwargs=',objOut.kwargs)
#If you want just the arguments to be printed...
print(test('arg-1','arg-2','arg-3','arg-n',key1=1,key2=2,key3=3,keyn='n').identity().args)
print(test('arg-1','arg-2','arg-3','arg-n',key1=1,key2=2,key3=3,keyn='n').identity().kwargs)
$ py test.py
args= ('arg-1', 'arg-2', 'arg-3', 'arg-n') kwargs= {'key1': 1, 'keyn': 'n', 'key2': 2, 'key3': 3}
('arg-1', 'arg-2', 'arg-3', 'arg-n')
{'key1': 1, 'keyn': 'n', 'key2': 2, 'key3': 3}
It's totally unexpected (to me at least) that foo would know foo inside of the function def for foo. What the heck is going on here?
>>> def foo(x):
... print "wow"
... print globals().get('foo', 'sorry')
... return foo
...
>>> f = foo(3)
wow
<function foo at 0x10135f8c0>
>>> f
<function foo at 0x10135f8c0>
Is this some sort of effect of python's lazy evaluation? It builds the function code first and puts it in globals, but then actually builds the function later when it's called? Wow... what form of python magic is this?
Of course, this makes for ease of recursion... and was probably the reason this is in the language...
>>> def bar(x):
... return bar(x)
...
>>> bar(3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in bar
File "<stdin>", line 2, in bar
File "<stdin>", line 2, in bar
...snip...
File "<stdin>", line 2, in bar
File "<stdin>", line 2, in bar
RuntimeError: maximum recursion depth exceeded
Maybe dis can help...
>>> def foo(x):
... print "wow"
... print globals().get('foo', 'sorry')
... return foo
...
>>> import dis
>>> dis.dis(foo)
2 0 LOAD_CONST 1 ('wow')
3 PRINT_ITEM
4 PRINT_NEWLINE
3 5 LOAD_GLOBAL 0 (globals)
8 CALL_FUNCTION 0
11 LOAD_ATTR 1 (get)
14 LOAD_CONST 2 ('foo')
17 LOAD_CONST 3 ('sorry')
20 CALL_FUNCTION 2
23 PRINT_ITEM
24 PRINT_NEWLINE
4 25 LOAD_GLOBAL 2 (foo)
28 RETURN_VALUE
>>>
Hmm. Complicated. So let's go simpler...
>>> def zap(x):
... return zap
...
>>> dis.dis(zap)
2 0 LOAD_GLOBAL 0 (zap)
3 RETURN_VALUE
>>>
Yeah it looks like the bytecode is built, and hold instructions to load zap from globals. So, the two-step process makes zap inside of zap not a special thing at all.
Let's see if we can dig into the process better and clarify...
>>> def blah(x):
... def hlab(y):
... return blah(x)
... return hlab
...
>>> blah.func_code.co_consts
(None, <code object hlab at 0x10fcdfd30, file "<stdin>", line 2>)
>>> b = blah(4)
>>> b
<function hlab at 0x10fce9c80>
>>> dis.dis(blah.func_code.co_consts[-1])
3 0 LOAD_GLOBAL 0 (blah)
3 LOAD_DEREF 0 (x)
6 CALL_FUNCTION 1
9 RETURN_VALUE
>>> dis.dis(b)
3 0 LOAD_GLOBAL 0 (blah)
3 LOAD_DEREF 0 (x)
6 CALL_FUNCTION 1
9 RETURN_VALUE
>>> b_ = blah.func_code.co_consts[-1]
>>> b.func_code
<code object hlab at 0x10fcdfd30, file "<stdin>", line 2>
>>> b_
<code object hlab at 0x10fcdfd30, file "<stdin>", line 2>
>>>
So it looks like the bytecode is built first, then the function is built from that... which then points back to the original bytecode. I still don't see how it's hooked up, but I assume that's done on the "stack" somehow. This process, at least, would make whatever name references used inside of the function def not special at all (i.e. irrelevant if foo uses foo or blah or whatever).
So ok, I get it. Nothing special.
Although this is a bit odd...
>>> b(2)
<function hlab at 0x10faf9410>
>>> b(2)(2)
<function hlab at 0x10faf9578>
>>> b(2)(2)(2)
<function hlab at 0x10faf9410>
>>> _ is b(2)
False
...but I assume it's just cycling available memory addresses, or something like that.
Nothing really mysterious.
There is, I believe a two pass aspect to this and you are doing a late evaluation in the body of the function.
update: I qualified my explanation of what happens after the definition step - it looks as some things do happen with the function code before it gets called. Wally's answer is better than mine in that respect.
Pass #1 - def foo(x) is found and set on the module's namespace. Probably adds the function parameters as well.
That's all, you don't drill down in the code.
update: Actually, I am not so sure about the code not being processed in some way before the call. Looking at foo.func_code internals showed no great difference on before and after call.
Also, if you have a syntax error like parenthesis nesting or whitespace problems that does show up before even calling the function so something's picking that up ahead of the call. See my code at the end.
I'll guess that the code is parsed but any variable resolution gets deferred until actual execution.
Pass #2 - you call foo(x), it is found and called.
The function's code gets executed for the first time.
When you hit globals()["foo"], it picks up the existing reference stored in #1.
You can also see some hints of this behavior when you run coverage.py or similar. On import, all the outer definitions in a module are flagged as covered.
But the actual code only gets covered when you call it.
Another way to think of it is that you need that namespacing pass first to set up references, before proceeding further. Otherwise, in my code below, foo would not find bar.
Here's some code I played to distinguish execution error vs syntax error...
def foo(x):
"""comment/uncomment to see behavior"""
pass
# return bar(x) #this works
return bar2(x) #call time error
# return bar(x) bad whitespace #IndentationError, nothing runs
print "foo defined"
def bar(x):
return x*2
print "calling foo#1"
try:
print foo(3)
except Exception, e:
print e
#let's make it so there is a bar2...
bar2 = bar
print "calling foo#2"
try:
print foo(6)
except Exception, e:
print e
When a function is called by unpacking arguments, it seems to increase the recursion depth twice. I would like to know why this happens.
Normally:
depth = 0
def f():
global depth
depth += 1
f()
try:
f()
except RuntimeError:
print(depth)
#>>> 999
With an unpacking call:
depth = 0
def f():
global depth
depth += 1
f(*())
try:
f()
except RuntimeError:
print(depth)
#>>> 500
In theory both should reach about 1000:
import sys
sys.getrecursionlimit()
#>>> 1000
This happens on CPython 2.7 and CPython 3.3.
On PyPy 2.7 and PyPy 3.3 there is a difference, but it is much smaller (1480 vs 1395 and 1526 vs 1395).
As you can see from the disassembly, there is little difference between the two, other than the type of call (CALL_FUNCTION vs CALL_FUNCTION_VAR):
import dis
def f():
f()
dis.dis(f)
#>>> 34 0 LOAD_GLOBAL 0 (f)
#>>> 3 CALL_FUNCTION 0 (0 positional, 0 keyword pair)
#>>> 6 POP_TOP
#>>> 7 LOAD_CONST 0 (None)
#>>> 10 RETURN_VALUE
def f():
f(*())
dis.dis(f)
#>>> 47 0 LOAD_GLOBAL 0 (f)
#>>> 3 BUILD_TUPLE 0
#>>> 6 CALL_FUNCTION_VAR 0 (0 positional, 0 keyword pair)
#>>> 9 POP_TOP
#>>> 10 LOAD_CONST 0 (None)
#>>> 13 RETURN_VALUE
The exception message actually offers you a hint. Compare the non-unpacking option:
>>> import sys
>>> sys.setrecursionlimit(4) # to get there faster
>>> def f(): f()
...
>>> f()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in f
File "<stdin>", line 1, in f
File "<stdin>", line 1, in f
RuntimeError: maximum recursion depth exceeded
with:
>>> def f(): f(*())
...
>>> f()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in f
File "<stdin>", line 1, in f
RuntimeError: maximum recursion depth exceeded while calling a Python object
Note the addition of the while calling a Python object. This exception is specific to the PyObject_CallObject() function. You won't see this exception when you set an odd recursion limit:
>>> sys.setrecursionlimit(5)
>>> f()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in f
File "<stdin>", line 1, in f
RuntimeError: maximum recursion depth exceeded
because that is the specific exception raised in the ceval.c frame evaluation code inside PyEval_EvalFrameEx():
/* push frame */
if (Py_EnterRecursiveCall(""))
return NULL;
Note the empty message there. This is a crucial difference.
For your 'regular' function (no variable arguments), what happens is that an optimized path is picked; a Python function that doesn't need tuple or keyword argument unpacking support is handled directly in the fast_function() function of the evaluation loop. A new frameobject with the Python bytecode object for the function is created, and run. This is one recursion check.
But for a function call with variable arguments (tuple or dictionary or both), the fast_function() call cannot be used. Instead, ext_do_call() (extended call) is used, which handles the argument unpacking, then uses PyObject_Call() to invoke the function. PyObject_Call() does a recursion limit check, and 'calls' the function object. The function object is invoked via the function_call() function, which calls PyEval_EvalCodeEx(), which calls PyEval_EvalFrameEx(), which makes the second recursion limit check.
TL;DR version
Python functions calling Python functions are optimised and bypass the PyObject_Call() C-API function, unless argument unpacking takes place. Both Python frame execution and PyObject_Call() make recursion limit tests, so bypassing PyObject_Call() avoids incrementing the recursion limit check per call.
More places with 'extra' recursion depth checks
You can grep the Python source code for Py_EnterRecursiveCall for other locations where recursion depth checks are made; various libraries, such as json and pickle use it to avoid parsing structures that are too deeply nested or recursive, for example. Other checks are placed in the list and tuple __repr__ implementations, rich comparisons (__gt__, __lt__, __eq__, etc.), handling the __call__ callable object hook and handling __str__ calls.
As such, you can hit the recursion limit much faster still:
>>> class C:
... def __str__(self):
... global depth
... depth += 1
... return self()
... def __call__(self):
... global depth
... depth += 1
... return str(self)
...
>>> depth = 0
>>> sys.setrecursionlimit(10)
>>> C()()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 9, in __call__
File "<stdin>", line 5, in __str__
RuntimeError: maximum recursion depth exceeded while calling a Python object
>>> depth
2
With iter(), I can do this:
>>> listWalker = iter ( [23, 47, 'hike'] )
>>> for x in listWalker: print x,
But I could do this anyway:
>>> listWalker = [23, 47, 'hike']
>>> for x in listWalker: print x,
What value does it add?
In addition to using iter to explicitly get an iterator for an object that implements the __iter__ method, there is the lesser-known two-argument form of iter, which makes an iterator which repeatedly calls a function until it returns a given sentinel value.
for line in iter(f.readline, 'EOF'):
print line
The preceding code would call f.read (for, say, an open file handle f) until it reads a line consisting of the string EOF. It's roughly the same as writing
for line in f:
if line == "EOF":
break
print line
Additionally, an iterator may be a distinct object from the object it iterates over. This is true for the list type. That means you can create two iterators, both of which iterate independently over the same object.
itr1 = iter(mylist)
itr2 = iter(mylist)
x = next(itr1) # First item of mylist
y = next(itr1) # Second item of my list
z = next(itr2) # First item of mylist, not the third
File handles, however, act as their own iterator:
>>> f = open('.bashrc')
>>> id(f)
4454569712
>>> id(iter(f))
4454569712
In general, the object returned by iter depends on the __iter__ method implemented by the object's type.
The point of iter is that it allows you to obtain the iterator from an iterable object and use it yourself, either to implement your own variant of the for loop, or to maintain the state of the iteration across multiple loops. A trivial example:
it = iter(['HEADER', 0, 1, 2, 3]) # coming from CSV or such
title = it.next()
for item in it:
# process item
...
A more advanced usage of iter is provided by this grouping idiom:
def in_groups(iterable, n):
"""Yield element from iterables grouped in tuples of size n."""
it = iter(iterable)
iters = [it] * n
return zip(*iters)
When you're doing a for loop on a variable, it implicitly call the __iter__ method of the iterable you passed in fact.
You're always using iter() is some way when you're looping over lists, tuples... and every iterable.
I think this extract of byte-code can convince you:
>>> def a():
... for x in [1,2,3]:
... print x
...
>>> import dis
>>> dis.dis(a)
2 0 SETUP_LOOP 28 (to 31)
3 LOAD_CONST 1 (1)
6 LOAD_CONST 2 (2)
9 LOAD_CONST 3 (3)
12 BUILD_LIST 3
15 GET_ITER # <--- get iter is important here
>> 16 FOR_ITER 11 (to 30)
19 STORE_FAST 0 (x)
3 22 LOAD_FAST 0 (x)
25 PRINT_ITEM
26 PRINT_NEWLINE
27 JUMP_ABSOLUTE 16
>> 30 POP_BLOCK
>> 31 LOAD_CONST 0 (None)
34 RETURN_VALUE
But, iterables allows you also some other things in Python, such as the use of next() to walk into an iterable, or raising a StopIteration exception. It can be useful if you're dealing with different object types and you want to apply a generic algorithm.
From the docs:
iter(o[, sentinel])
[...] Without a second argument, o must be a collection object which supports the
iteration protocol (the __iter__() method), or it must support the
sequence protocol (the __getitem__() method with integer arguments
starting at 0). If it does not support either of those protocols,
TypeError is raised. [...]
So it constructs an iterator from an object.
As you say, this is done automatically in loops and comprehensions but some times you want to get an iterator and handle it directly. Just keep it in the back of your mind until you need it.
When using the second argument:
If the second argument, sentinel, is given, then o must be a callable object.
The iterator created in this case will call o with no arguments for each call
to its next() method; if the value returned is equal to sentinel,
StopIteration will be raised, otherwise the value will be returned.
This is useful for many things but particularly so for legacy style functions like file.read(bufsize) which has to be called repeatedly until it returns "". That can be converted to an iterator with iter(lambda : file.read(bufsize), ""). Nice and clean!