Is `iter(callable, sentinel)` still Pythonic? - python

The second argument of the iter function is useful for looping over objects that don't define themselves as iterable, such as binary files:
response = b''
for block in iter(partial(f.read, 256), b''):
response += block
However in Python 3.8 we now have the “the walrus operator”, which in the What's new in Python 3.8 article is mentioned as way to solve the exact problem above:
# Loop over fixed length blocks
while (block := f.read(256)) != '':
process(block)
I wonder if the later is now considered "the right approach"? And if so, if there is ever any need for the second argument of iter, since any code on the form
for x in iter(f, y):
g(x)
might as well now be written:
while (x := f()) != y:
g(x)
I guess there might still be cases where we don't want to immediately loop the iterable, such as b''.join(iter(partial(f.read, 256), b'')) or some code (though it quickly gets pretty hairy).
Also a loop like for i, x in enumerate(iter(f, y)): might be hard to translate to the new syntax(?)
The PEP for walrus only mentions 2-arg iter in the example while h(x := f()): g(x), which it says can't trivially be translated to iter.
Python usually have pretty precise guidelines on these sorts of things, but I haven't been able to find any for this particular issue. Can you help me?

The assignment expression is useful if you are primarily interested in immediately iterating over the iterator, but it doesn't help you define an iterator to be used elsewhere.
For example, you may want to create an iterator that will first be wrapped in map, or filter, or itertools.islice, before finally iterating over the final result using a for loop.

Related

Python use lambda as for loop recursion error

I'm messing with python, and I'm looking for a way to replicate a for loop in a lambda.
Basically, I'd like to convert the function below to a lambda doing the same thing :
def basicForLoop(x):
for i in range(x):
print(i)
basicForLoop(100)
For now, I've managed to do it by using recursion and increasing the value to each new recursion :
(lambda f: lambda x: f(f,0, x))(lambda f,current,max: print(current) or f(f, current+1, max) if current <= max else None)(100)
This work rather well, but it hit the max recursion depth as soon as the number start to be too big, so I'm looking for a way to rearrange this lambda so that it can be used without worrying about the recursion depth to make it truly equivalent to the original function.
EDIT : I'm looking for a way to do this while keeping the loop logic directly inside the lambda, delegating the loop to another function like map, join, ... isn't what I'm looking for.
PS. I know very well that this is an abomination that should never be used in reality but I'm just curious about it.
I'm pretty sure this is impossible.
So I'm assuming you want to keep pretty much all of the logic handled by your lambdas, not even using a range. If that's the case, you're not going to get a stack-safe solution in Python. In other languages you could use something called "Tail Recursion", which allows the interpreter/compiler to collapse some recursive calls down to a single stack frame, but Python does not support that.
I don't think you can make this use fewer stack frames, either. Rewriting and re-formatting a bit, and adding explicit names and more print statements:
buildRecursive = (lambda g:
print("Running 1st") or
(lambda x:
print ("Running 2nd") or
g(g,0, x))
)
entry = buildRecursive (lambda f,current,max:
print("Running 3rd") or
print(current) or f(f, current+1, max) if current <= max else None)
entry (100)
This should be equivalent to what you have. This has print statements as the first operation of every call, and you can see that you're only running the 3rd one repeatedly. Essentially, you're generating as few stack frames per iteration as possible, given the constraints as I understand them.
As an aside, after some reading I understand why you're doing the or thing, but coming from other languages, that is downright hideous. It might be the python way of doing things, but it's a pretty awful way of sequencing operations - especially because of short-circuiting logic, so if you have try to bind operations such that the first doesn't return None, your code will mysteriously break. I would suggest using a tuple instead - (firstOp, secondOp) - but I think that will have memory or performance implications as Python will actually build the resulting value.
You might define your own infix operator which will evaluate both left and right operands in order and return the second (or the first... can you feel functional programming calling yet?). However in Python I think this will result in additional stack frames, as the operator will produce its own totally trivial stack frame.
Have you explored languages other than Python? If not I'd say it's time.
You could do something like that:
x = lambda x: print("\n".join(map(str, list(range(1,x+1)))))
x(100)
Edit:
You can do it like that:
x = lambda x: print(*range(1,x+1), sep='\n')
x(100)

Do different types of recursion have different memory complexities? [duplicate]

I have the following piece of code which fails with the following error:
RuntimeError: maximum recursion depth exceeded
I attempted to rewrite this to allow for tail recursion optimization (TCO). I believe that this code should have been successful if a TCO had taken place.
def trisum(n, csum):
if n == 0:
return csum
else:
return trisum(n - 1, csum + n)
print(trisum(1000, 0))
Should I conclude that Python does not do any type of TCO, or do I just need to define it differently?
No, and it never will since Guido van Rossum prefers to be able to have proper tracebacks:
Tail Recursion Elimination (2009-04-22)
Final Words on Tail Calls (2009-04-27)
You can manually eliminate the recursion with a transformation like this:
>>> def trisum(n, csum):
... while True: # Change recursion to a while loop
... if n == 0:
... return csum
... n, csum = n - 1, csum + n # Update parameters instead of tail recursion
>>> trisum(1000,0)
500500
I published a module performing tail-call optimization (handling both tail-recursion and continuation-passing style): https://github.com/baruchel/tco
Optimizing tail-recursion in Python
It has often been claimed that tail-recursion doesn't suit the Pythonic way of coding and that one shouldn't care about how to embed it in a loop. I don't want to argue with
this point of view; sometimes however I like trying or implementing new ideas
as tail-recursive functions rather than with loops for various reasons (focusing on the
idea rather than on the process, having twenty short functions on my screen in the same
time rather than only three "Pythonic" functions, working in an interactive session rather than editing my code, etc.).
Optimizing tail-recursion in Python is in fact quite easy. While it is said to be impossible
or very tricky, I think it can be achieved with elegant, short and general solutions; I even
think that most of these solutions don't use Python features otherwise than they should.
Clean lambda expressions working along with very standard loops lead to quick, efficient and
fully usable tools for implementing tail-recursion optimization.
As a personal convenience, I wrote a small module implementing such an optimization
by two different ways. I would like to discuss here about my two main functions.
The clean way: modifying the Y combinator
The Y combinator is well known; it allows to use lambda functions in a recursive
manner, but it doesn't allow by itself to embed recursive calls in a loop. Lambda
calculus alone can't do such a thing. A slight change in the Y combinator however
can protect the recursive call to be actually evaluated. Evaluation can thus be delayed.
Here is the famous expression for the Y combinator:
lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args)))
With a very slight change, I could get:
lambda f: (lambda x: x(x))(lambda y: f(lambda *args: lambda: y(y)(*args)))
Instead of calling itself, the function f now returns a function performing the
very same call, but since it returns it, the evaluation can be done later from outside.
My code is:
def bet(func):
b = (lambda f: (lambda x: x(x))(lambda y:
f(lambda *args: lambda: y(y)(*args))))(func)
def wrapper(*args):
out = b(*args)
while callable(out):
out = out()
return out
return wrapper
The function can be used in the following way; here are two examples with tail-recursive
versions of factorial and Fibonacci:
>>> from recursion import *
>>> fac = bet( lambda f: lambda n, a: a if not n else f(n-1,a*n) )
>>> fac(5,1)
120
>>> fibo = bet( lambda f: lambda n,p,q: p if not n else f(n-1,q,p+q) )
>>> fibo(10,0,1)
55
Obviously recursion depth isn't an issue any longer:
>>> bet( lambda f: lambda n: 42 if not n else f(n-1) )(50000)
42
This is of course the single real purpose of the function.
Only one thing can't be done with this optimization: it can't be used with a
tail-recursive function evaluating to another function (this comes from the fact
that callable returned objects are all handled as further recursive calls with
no distinction). Since I usually don't need such a feature, I am very happy
with the code above. However, in order to provide a more general module, I thought
a little more in order to find some workaround for this issue (see next section).
Concerning the speed of this process (which isn't the real issue however), it happens
to be quite good; tail-recursive functions are even evaluated much quicker than with
the following code using simpler expressions:
def bet1(func):
def wrapper(*args):
out = func(lambda *x: lambda: x)(*args)
while callable(out):
out = func(lambda *x: lambda: x)(*out())
return out
return wrapper
I think that evaluating one expression, even complicated, is much quicker than
evaluating several simple expressions, which is the case in this second version.
I didn't keep this new function in my module, and I see no circumstances where it
could be used rather than the "official" one.
Continuation passing style with exceptions
Here is a more general function; it is able to handle all tail-recursive functions,
including those returning other functions. Recursive calls are recognized from
other return values by the use of exceptions. This solutions is slower than the
previous one; a quicker code could probably be written by using some special
values as "flags" being detected in the main loop, but I don't like the idea of
using special values or internal keywords. There is some funny interpretation
of using exceptions: if Python doesn't like tail-recursive calls, an exception
should be raised when a tail-recursive call does occur, and the Pythonic way will be
to catch the exception in order to find some clean solution, which is actually what
happens here...
class _RecursiveCall(Exception):
def __init__(self, *args):
self.args = args
def _recursiveCallback(*args):
raise _RecursiveCall(*args)
def bet0(func):
def wrapper(*args):
while True:
try:
return func(_recursiveCallback)(*args)
except _RecursiveCall as e:
args = e.args
return wrapper
Now all functions can be used. In the following example, f(n) is evaluated to the
identity function for any positive value of n:
>>> f = bet0( lambda f: lambda n: (lambda x: x) if not n else f(n-1) )
>>> f(5)(42)
42
Of course, it could be argued that exceptions are not intended to be used for intentionally
redirecting the interpreter (as a kind of goto statement or probably rather a kind of
continuation passing style), which I have to admit. But, again,
I find funny the idea of using try with a single line being a return statement: we try to return
something (normal behaviour) but we can't do it because of a recursive call occurring (exception).
Initial answer (2013-08-29).
I wrote a very small plugin for handling tail recursion. You may find it with my explanations there: https://groups.google.com/forum/?hl=fr#!topic/comp.lang.python/dIsnJ2BoBKs
It can embed a lambda function written with a tail recursion style in another function which will evaluate it as a loop.
The most interesting feature in this small function, in my humble opinion, is that the function doesn't rely on some dirty programming hack but on mere lambda calculus: the behaviour of the function is changed to another one when inserted in another lambda function which looks very like the Y combinator.
The word of Guido is at http://neopythonic.blogspot.co.uk/2009/04/tail-recursion-elimination.html
I recently posted an entry in my Python History blog on the origins of
Python's functional features. A side remark about not supporting tail
recursion elimination (TRE) immediately sparked several comments about
what a pity it is that Python doesn't do this, including links to
recent blog entries by others trying to "prove" that TRE can be added
to Python easily. So let me defend my position (which is that I don't
want TRE in the language). If you want a short answer, it's simply
unpythonic. Here's the long answer:
CPython does not and will probably never support tail call optimization based on Guido van Rossum's statements on the subject.
I've heard arguments that it makes debugging more difficult because of how it modifies the stack trace.
Try the experimental macropy TCO implementation for size.
Besides optimizing tail recursion, you can set the recursion depth manually by:
import sys
sys.setrecursionlimit(5500000)
print("recursion limit:%d " % (sys.getrecursionlimit()))
There is no built-in tail recursion optimization in Python. However, we can "rebuild" the function through the Abstract Syntax Tree( AST), eliminating the recursion there and replacing it with a loop. Guido was absolutely right, this approach has some limitations, so I can't recommend it for use.
However, I still wrote (rather as a training example) my own version of the optimizer, and you can even try how it works.
Install this package via pip:
pip install astrologic
Now you can run this sample code:
from astrologic import no_recursion
counter = 0
#no_recursion
def recursion():
global counter
counter += 1
if counter != 10000000:
return recursion()
return counter
print(recursion())
This solution is not stable, and you should never use it in production. You can read about some significant restrictions on the page in github (in Russian, sorry). However, this solution is quite "real", without interrupting the code and other similar tricks.
A tail call can never be optimized to a jump in Python. An optimization is a program transformation that preserves the program's meaning. Tail-call elimination doesn't preserve the meaning of Python programs.
One problem, often mentioned, is that tail-call elimination changes the call stack, and Python allows for runtime introspection of the stack. But there is another problem that is rarely mentioned. There is probably a lot of code like this in the wild:
def map_file(path):
f = open(path, 'rb')
return mmap.mmap(f.fileno())
The call to mmap.mmap is in tail position. If it were replaced by a jump, then the current stack frame would be discarded before control was passed to mmap. The current stack frame contains the only reference to the file object, so the file object could (and in CPython would) be freed before mmap is called, which would close the file descriptor, invalidating it before mmap sees it.
At best, the code would fail with an exception. At worst, the file descriptor could be reused in another thread, causing mmap to map the wrong file. So this "optimization" would be a potentially disastrous thing to unleash on the huge body of existing Python code.
The Python spec guarantees that such problems won't occur, so you can be sure that no conformant implementation will ever convert return f(args) into a jump—unless, perhaps, it has a sophisticated static analysis engine that can prove that discarding an object early will have no observable consequences in this case.
None of that would prevent Python from adding a syntax for explicit tail calls with jump semantics, such as
return from f(args)
That wouldn't break code that didn't use it, and it would probably be useful for autogenerated code and some algorithms. GvR is no longer BDFL, so it might happen, but I wouldn't hold my breath.

Confused on why generators are useful [duplicate]

I'm starting to learn Python and I've come across generator functions, those that have a yield statement in them. I want to know what types of problems that these functions are really good at solving.
Generators give you lazy evaluation. You use them by iterating over them, either explicitly with 'for' or implicitly by passing it to any function or construct that iterates. You can think of generators as returning multiple items, as if they return a list, but instead of returning them all at once they return them one-by-one, and the generator function is paused until the next item is requested.
Generators are good for calculating large sets of results (in particular calculations involving loops themselves) where you don't know if you are going to need all results, or where you don't want to allocate the memory for all results at the same time. Or for situations where the generator uses another generator, or consumes some other resource, and it's more convenient if that happened as late as possible.
Another use for generators (that is really the same) is to replace callbacks with iteration. In some situations you want a function to do a lot of work and occasionally report back to the caller. Traditionally you'd use a callback function for this. You pass this callback to the work-function and it would periodically call this callback. The generator approach is that the work-function (now a generator) knows nothing about the callback, and merely yields whenever it wants to report something. The caller, instead of writing a separate callback and passing that to the work-function, does all the reporting work in a little 'for' loop around the generator.
For example, say you wrote a 'filesystem search' program. You could perform the search in its entirety, collect the results and then display them one at a time. All of the results would have to be collected before you showed the first, and all of the results would be in memory at the same time. Or you could display the results while you find them, which would be more memory efficient and much friendlier towards the user. The latter could be done by passing the result-printing function to the filesystem-search function, or it could be done by just making the search function a generator and iterating over the result.
If you want to see an example of the latter two approaches, see os.path.walk() (the old filesystem-walking function with callback) and os.walk() (the new filesystem-walking generator.) Of course, if you really wanted to collect all results in a list, the generator approach is trivial to convert to the big-list approach:
big_list = list(the_generator)
One of the reasons to use generator is to make the solution clearer for some kind of solutions.
The other is to treat results one at a time, avoiding building huge lists of results that you would process separated anyway.
If you have a fibonacci-up-to-n function like this:
# function version
def fibon(n):
a = b = 1
result = []
for i in xrange(n):
result.append(a)
a, b = b, a + b
return result
You can more easily write the function as this:
# generator version
def fibon(n):
a = b = 1
for i in xrange(n):
yield a
a, b = b, a + b
The function is clearer. And if you use the function like this:
for x in fibon(1000000):
print x,
in this example, if using the generator version, the whole 1000000 item list won't be created at all, just one value at a time. That would not be the case when using the list version, where a list would be created first.
Real World Example
Let's say you have 100 million domains in your MySQL table, and you would like to update Alexa rank for each domain.
First thing you need is to select your domain names from the database.
Let's say your table name is domains and column name is domain.
If you use SELECT domain FROM domains it's going to return 100 million rows which is going to consume lot of memory. So your server might crash.
So you decided to run the program in batches. Let's say our batch size is 1000.
In our first batch we will query the first 1000 rows, check Alexa rank for each domain and update the database row.
In our second batch we will work on the next 1000 rows. In our third batch it will be from 2001 to 3000 and so on.
Now we need a generator function which generates our batches.
Here is our generator function:
def ResultGenerator(cursor, batchsize=1000):
while True:
results = cursor.fetchmany(batchsize)
if not results:
break
for result in results:
yield result
As you can see, our function keeps yielding the results. If you used the keyword return instead of yield, then the whole function would be ended once it reached return.
return - returns only once
yield - returns multiple times
If a function uses the keyword yield then it's a generator.
Now you can iterate like this:
db = MySQLdb.connect(host="localhost", user="root", passwd="root", db="domains")
cursor = db.cursor()
cursor.execute("SELECT domain FROM domains")
for result in ResultGenerator(cursor):
doSomethingWith(result)
db.close()
I find this explanation which clears my doubt. Because there is a possibility that person who don't know Generators also don't know about yield
Return
The return statement is where all the local variables are destroyed and the resulting value is given back (returned) to the caller. Should the same function be called some time later, the function will get a fresh new set of variables.
Yield
But what if the local variables aren't thrown away when we exit a function? This implies that we can resume the function where we left off. This is where the concept of generators are introduced and the yield statement resumes where the function left off.
def generate_integers(N):
for i in xrange(N):
yield i
In [1]: gen = generate_integers(3)
In [2]: gen
<generator object at 0x8117f90>
In [3]: gen.next()
0
In [4]: gen.next()
1
In [5]: gen.next()
So that's the difference between return and yield statements in Python.
Yield statement is what makes a function a generator function.
So generators are a simple and powerful tool for creating iterators. They are written like regular functions, but they use the yield statement whenever they want to return data. Each time next() is called, the generator resumes where it left off (it remembers all the data values and which statement was last executed).
See the "Motivation" section in PEP 255.
A non-obvious use of generators is creating interruptible functions, which lets you do things like update UI or run several jobs "simultaneously" (interleaved, actually) while not using threads.
Buffering. When it is efficient to fetch data in large chunks, but process it in small chunks, then a generator might help:
def bufferedFetch():
while True:
buffer = getBigChunkOfData()
# insert some code to break on 'end of data'
for i in buffer:
yield i
The above lets you easily separate buffering from processing. The consumer function can now just get the values one by one without worrying about buffering.
I have found that generators are very helpful in cleaning up your code and by giving you a very unique way to encapsulate and modularize code. In a situation where you need something to constantly spit out values based on its own internal processing and when that something needs to be called from anywhere in your code (and not just within a loop or a block for example), generators are the feature to use.
An abstract example would be a Fibonacci number generator that does not live within a loop and when it is called from anywhere will always return the next number in the sequence:
def fib():
first = 0
second = 1
yield first
yield second
while 1:
next = first + second
yield next
first = second
second = next
fibgen1 = fib()
fibgen2 = fib()
Now you have two Fibonacci number generator objects which you can call from anywhere in your code and they will always return ever larger Fibonacci numbers in sequence as follows:
>>> fibgen1.next(); fibgen1.next(); fibgen1.next(); fibgen1.next()
0
1
1
2
>>> fibgen2.next(); fibgen2.next()
0
1
>>> fibgen1.next(); fibgen1.next()
3
5
The lovely thing about generators is that they encapsulate state without having to go through the hoops of creating objects. One way of thinking about them is as "functions" which remember their internal state.
I got the Fibonacci example from Python Generators - What are they? and with a little imagination, you can come up with a lot of other situations where generators make for a great alternative to for loops and other traditional iteration constructs.
The simple explanation:
Consider a for statement
for item in iterable:
do_stuff()
A lot of the time, all the items in iterable doesn't need to be there from the start, but can be generated on the fly as they're required. This can be a lot more efficient in both
space (you never need to store all the items simultaneously) and
time (the iteration may finish before all the items are needed).
Other times, you don't even know all the items ahead of time. For example:
for command in user_input():
do_stuff_with(command)
You have no way of knowing all the user's commands beforehand, but you can use a nice loop like this if you have a generator handing you commands:
def user_input():
while True:
wait_for_command()
cmd = get_command()
yield cmd
With generators you can also have iteration over infinite sequences, which is of course not possible when iterating over containers.
My favorite uses are "filter" and "reduce" operations.
Let's say we're reading a file, and only want the lines which begin with "##".
def filter2sharps( aSequence ):
for l in aSequence:
if l.startswith("##"):
yield l
We can then use the generator function in a proper loop
source= file( ... )
for line in filter2sharps( source.readlines() ):
print line
source.close()
The reduce example is similar. Let's say we have a file where we need to locate blocks of <Location>...</Location> lines. [Not HTML tags, but lines that happen to look tag-like.]
def reduceLocation( aSequence ):
keep= False
block= None
for line in aSequence:
if line.startswith("</Location"):
block.append( line )
yield block
block= None
keep= False
elif line.startsWith("<Location"):
block= [ line ]
keep= True
elif keep:
block.append( line )
else:
pass
if block is not None:
yield block # A partial block, icky
Again, we can use this generator in a proper for loop.
source = file( ... )
for b in reduceLocation( source.readlines() ):
print b
source.close()
The idea is that a generator function allows us to filter or reduce a sequence, producing a another sequence one value at a time.
A practical example where you could make use of a generator is if you have some kind of shape and you want to iterate over its corners, edges or whatever. For my own project (source code here) I had a rectangle:
class Rect():
def __init__(self, x, y, width, height):
self.l_top = (x, y)
self.r_top = (x+width, y)
self.r_bot = (x+width, y+height)
self.l_bot = (x, y+height)
def __iter__(self):
yield self.l_top
yield self.r_top
yield self.r_bot
yield self.l_bot
Now I can create a rectangle and loop over its corners:
myrect=Rect(50, 50, 100, 100)
for corner in myrect:
print(corner)
Instead of __iter__ you could have a method iter_corners and call that with for corner in myrect.iter_corners(). It's just more elegant to use __iter__ since then we can use the class instance name directly in the for expression.
Basically avoiding call-back functions when iterating over input maintaining state.
See here and here for an overview of what can be done using generators.
Since the send method of a generator has not been mentioned, here is an example:
def test():
for i in xrange(5):
val = yield
print(val)
t = test()
# Proceed to 'yield' statement
next(t)
# Send value to yield
t.send(1)
t.send('2')
t.send([3])
It shows the possibility to send a value to a running generator. A more advanced course on generators in the video below (including yield from explination, generators for parallel processing, escaping the recursion limit, etc.)
David Beazley on generators at PyCon 2014
Some good answers here, however, I'd also recommend a complete read of the Python Functional Programming tutorial which helps explain some of the more potent use-cases of generators.
Particularly interesting is that it is now possible to update the yield variable from outside the generator function, hence making it possible to create dynamic and interwoven coroutines with relatively little effort.
Also see PEP 342: Coroutines via Enhanced Generators for more information.
I use generators when our web server is acting as a proxy:
The client requests a proxied url from the server
The server begins to load the target url
The server yields to return the results to the client as soon as it gets them
Piles of stuff. Any time you want to generate a sequence of items, but don't want to have to 'materialize' them all into a list at once. For example, you could have a simple generator that returns prime numbers:
def primes():
primes_found = set()
primes_found.add(2)
yield 2
for i in itertools.count(1):
candidate = i * 2 + 1
if not all(candidate % prime for prime in primes_found):
primes_found.add(candidate)
yield candidate
You could then use that to generate the products of subsequent primes:
def prime_products():
primeiter = primes()
prev = primeiter.next()
for prime in primeiter:
yield prime * prev
prev = prime
These are fairly trivial examples, but you can see how it can be useful for processing large (potentially infinite!) datasets without generating them in advance, which is only one of the more obvious uses.
Also good for printing the prime numbers up to n:
def genprime(n=10):
for num in range(3, n+1):
for factor in range(2, num):
if num%factor == 0:
break
else:
yield(num)
for prime_num in genprime(100):
print(prime_num)

Python iterating inside function arguments

I know I’ve seen (perhaps exclusively in other languages) where you can use for loops in function arguments. I forget what it was called, but in an attempt to make my code smaller I want to try it. For those of you who don't know what I'm talking about, it goes something like this:
math.sum(for i in range(5)) # Just an example; code will probably not work
Or something like that? I'm not sure how it works yet, but I intend to learn. I know there is a name for this sort of thing, but I've forgotten what it is. Could anyone give me some pointers, or am I insane and this doesn't exist in python?
A "for loop as an expression" is usually called a "comprehension", at least in Haskell, Python, and other languages inspired by them.
Read List Comprehensions in the tutorial for an introduction to the idea. There are also set comprehensions and dict comprehensions, which are pretty obvious once you get list comprehensions.
Then there are generator expressions, which are a bit trickier—but a lot cooler. You're not going to understand those until you first read Iterators, and then Generators, and then Generator Expressions is the very next section.
It still probably won't be clear why generator expressions are cool, but David Beazley explains that masterfully.
To translate your code to real code, all you need is:
math.sum(i for i in range(5))
However, what you're asking for is "all of the elements of range(5), which you can do a lot more easily like this:
math.sum(range(5))
Why? Because a range is already an iterable object.* If it weren't, you couldn't use it in a for loop in the first place, by definition.
Unless you have either some expression to perform on each element, or an if clause to filter the loop, or multiple for clauses for nested looping, comprehensions don't buy you anything. So, here's some more useful examples:
math.sum(i*i for i in range(5))
math.sum(i for i in range(5) if i%3 != 0)
math.sum(j for i in range(5) for j in range(i))
* Technically speaking, you're asking for an iterator over all of the elements in range(5), not just any iterable over them. For a for loop it doesn't matter, but if you need something that you can call next on manually, have it remember its current position, etc., it does. In that case, what you want is iter(range(5)).
The fact that your comprehension happens to be a function argument is almost completely irrelevant here. You can use them anywhere you can use an expression:
squares_to_5 = (i*i for i in range(5)) # often useful
for square in (i*i for i in range(5)): # silly, but legal
However, notice that generator expressions need to be put inside parentheses. In the special case where a generator expression is the only argument to a function, so it's already in parentheses, you can leave the extra parentheses off.
You're thinking of list comprehensions and generator expressions.
This would work in Python with only a slight modification:
sum(i for i in range(5))
This is the seminal work on generators: http://www.dabeaz.com/generators/
Technically speaking they are completely unrelated to the fact that you're using them as function arguments:
x = (i for i in range(5))
evens = [i for i in range(100) if i % 2 == 0]
even_squares = [i**2 for i in evens]

How to write Python generator function that never yields anything

I want to write a Python generator function that never actually yields anything. Basically it's a "do-nothing" drop-in that can be used by other code which expects to call a generator (but doesn't always need results from it). So far I have this:
def empty_generator():
# ... do some stuff, but don't yield anything
if False:
yield
Now, this works OK, but I'm wondering if there's a more expressive way to say the same thing, that is, declare a function to be a generator even if it never yields any value. The trick I've employed above is to show Python a yield statement inside my function, even though it is unreachable.
Another way is
def empty_generator():
return
yield
Not really "more expressive", but shorter. :)
Note that iter([]) or simply [] will do as well.
An even shorter solution:
def empty_generator():
yield from []
For maximum readability and maintainability, I would prioritize a construct which goes at the top of the function. So either
your original if False: yield construct, but hoisted to the very first line, or
a separate decorator which adds generator behavior to a non-generator callable.
(That's assuming you didn't just need a callable which did something and then returned an empty iterable/iterator. If so then you could just use a regular function and return ()/return iter(()) at the end.)
Imagine the reader of your code sees:
def name_fitting_what_the_function_does():
# We need this function to be an empty generator:
if False: yield
# that crucial stuff that this function exists to do
Having this at the top immediately cues in every reader of this function to this detail, which affects the whole function - affects the expectations and interpretations of this function's behavior and usage.
How long is your function body? More than a couple lines? Then as a reader, I will feel righteous fury and condemnation towards the author if I don't get a cue that this function is a generator until the very end, because I will probably have spent significant mental cost weaving a model in my head based on the assumption that this is a regular function - the first yield in a generator should ideally be immediately visible, when you don't even know to look for it.
Also, in a function longer than a few lines, a construct at the very beginning of the function is more trustworthy - I can trust that anyone who has looked at a function has probably seen its first line every time they looked at it. That means a higher chance that if that line was mistaken or broken, someone would have spotted it. That means I can be less vigilant for the possibility that this whole thing is actually broken but being used in a way that makes the breakage non-obvious.
If you're working with people who are sufficiently fluently familiar with the workings of Python, you could even leave off that comment, because to someone who immediately remembers that yield is what makes Python turn a function into a generator, it is obvious that this is the effect, and probably the intent since there is no other reason for correct code to have a non-executed yield.
Alternatively, you could go the decorator route:
#generator_that_yields_nothing
def name_fitting_what_the_function_does():
# that crucial stuff for which this exists
def generator_that_yields_nothing(wrapped):
#functools.wraps(wrapped)
def wrapper_generator():
if False: yield
wrapped()
return wrapper_generator

Categories

Resources