Attempting to understand yield as an expression - python

I'm playing around with generators and generator expressions and I'm not completely sure that I understand how they work (some reference material):
>>> a = (x for x in range(10))
>>> next(a)
0
>>> next(a)
1
>>> a.send(-1)
2
>>> next(a)
3
So it looks like generator.send was ignored. That makes sense (I guess) because there is no explicit yield expression to catch the sent information ...
However,
>>> a = ((yield x) for x in range(10))
>>> next(a)
0
>>> print next(a)
None
>>> print next(a)
1
>>> print next(a)
None
>>> a.send(-1) #this send is ignored, Why? ... there's a yield to catch it...
2
>>> print next(a)
None
>>> print next(a)
3
>>> a.send(-1) #this send isn't ignored
-1
I understand this is pretty far out there, and I (currently) can't think of a use-case for this (so don't ask;)
I'm mostly just exploring to try to figure out how these various generator methods work (and how generator expressions work in general). Why does my second example alternate between yielding a sensible value and None? Also, Can anyone explain why one of my generator.send's was ignored while the other wasn't?

The confusion here is that the generator expression is doing a hidden yield. Here it is in function form:
def foo():
for x in range(10):
yield (yield x)
When you do a .send(), what happens is the inner yield x gets executed, which yields x. Then the expression evaluates to the value of the .send, and the next yield yields that. Here it is in clearer form:
def foo():
for x in range(10):
sent_value = (yield x)
yield sent_value
Thus the output is very predictable:
>>> a = foo()
#start it off
>>> a.next()
0
#execution has now paused at "sent_value = ?"
#now we fill in the "?". whatever we send here will be immediately yielded.
>>> a.send("yieldnow")
'yieldnow'
#execution is now paused at the 'yield sent_value' expression
#as this is not assigned to anything, whatever is sent now will be lost
>>> a.send("this is lost")
1
#now we're back where we were at the 'yieldnow' point of the code
>>> a.send("yieldnow")
'yieldnow'
#etc, the loop continues
>>> a.send("this is lost")
2
>>> a.send("yieldnow")
'yieldnow'
>>> a.send("this is lost")
3
>>> a.send("yieldnow")
'yieldnow'
EDIT: Example usage. By far the coolest one I've seen so far is twisted's inlineCallbacks function. See here for an article explaining it. The nub of it is it lets you yield functions to be run in threads, and once the functions are done, twisted sends the result of the function back into your code. Thus you can write code that heavily relies on threads in a very linear and intuitive manner, instead of having to write tons of little functions all over the place.
See the PEP 342 for more info on the rationale of having .send work with potential use cases (the twisted example I provided is an example of the boon to asynchronous I/O this change offered).

You're confusing yourself a bit because you actually are generating from two sources: the generator expression (... for x in range(10)) is one generator, but you create another source with the yield. You can see that if do list(a) you'll get [0, None, 1, None, 2, None, 3, None, 4, None, 5, None, 6, None, 7, None, 8, None, 9, None].
Your code is equivalent to this:
>>> def gen():
... for x in range(10):
... yield (yield x)
Only the inner yield ("yield x") is "used" in the generator --- it is used as the value of the outer yield. So this generator iterates back and forth between yielding values of the range, and yielding whatever is "sent" to those yields. If you send something to the inner yield, you get it back, but if you happen to send on an even-numbered iteration, the send is sent to the outer yield and is ignored.

This generator translates into:
for i in xrange(10):
x = (yield i)
yield x
Result of second call to send()/next() are ignored, because you do nothing with result of one of yields.

The generator you wrote is equivalent to the more verbose:
def testing():
for x in range(10):
x = (yield x)
yield x
As you can see here, the second yield, which is implicit in the generator expression, does not save the value you pass it, therefore depending on where the generator execution is blocked the send may or may not work.

Indeed - the send method is meant to work with a generator object that is the result of a co-routine you have explicitly written. It is difficult to get some meaning to it in a generator expression - though it works.
-- EDIT --
I had previously written this, but it is incorrecct, as yield inside generator expressions are predictable across implementations - though not mentioned in any PEP.
generator expressions are not meant to have the yield keyword - I am
not shure the behavior is even defined in this case. We could think a
little and get to what is happening on your expression, to meet from
where those "None"s are coming from. However, assume that as a side
effect of how the yield is implemented in Python (and probably it is
even implementation dependent), not as something that should be so.
The correct form for a generator expression, in a simplified manner is:
(<expr> for <variable> in <sequence> [if <expr>])
so, <expr> is evaluated for each value in the <sequence: - not only is yield uneeded, as you should not use it.
Both yield and the send methods are meant to be used in full co-routines, something like:
def doubler():
value = 0
while value < 100:
value = 2 * (yield value)
And you can use it like:
>>> a = doubler()
>>> # Next have to be called once, so the code will run up to the first "yield"
...
>>> a.next()
0
>>> a.send(10)
20
>>> a.send(20)
40
>>> a.send(23)
46
>>> a.send(51)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>>

Related

Difference between different generators, "yield" and returning tuple iteration in a function

What is the proper difference between doing yield i from an iteration and return (i for i in range(10)).
def generator1():
for i in range(10):
yield i
def generator2():
return (i for i in range(10))
For example, see these functions generator1() and generator2() both are differently written but both return a generator.
Outputs of IDLE:-
>>> generator1()
>>> <generator object generator1 at 0x107870468>
>>> generator2()
>>> <generator object generator2.<locals>.<genexpr> at 0x107870db0>
>>> import sys
>>> sys.getsizeof(generator1())
>>> 88
>>> sys.getsizeof(generator2())
>>> 88
As we can tell generator2() has fewer LOC (line of code) than generator2 and also the size of the object is the same, I've some questions.
What is the difference between both functions?
What does <genexpr> means when printing generator2()?
Which is the more suitable and efficient way of creating a generator?
The difference is where the generator is defined. generator1 is a special generator function, because it contains a yield statement. generator functions always return generators. The generator is defined when you invoke generator1.
generator2 is a regular function that uses a generator expression to construct a generator, and then returns it. The generator is defined when the line (i for i in range(10)) is executed. But if you add more logic, generator2 can return anything else, like None. For example:
def generator2(do_generator):
if do_generator:
return (i for i in range(10))
else:
return "I quit"
You can't do anything like that with generator1. It cannot return anything except a generator.
<genexpr> is short for generator expression. In your case, that's (i for i in range(10)). Generator expressions are very similar to list comprehensions, but they produce generators rather than lists.

Networkx: Example for antichains algorithms [duplicate]

I am reading the Python cookbook at the moment and am currently looking at generators. I'm finding it hard to get my head round.
As I come from a Java background, is there a Java equivalent? The book was speaking about 'Producer / Consumer', however when I hear that I think of threading.
What is a generator and why would you use it? Without quoting any books, obviously (unless you can find a decent, simplistic answer direct from a book). Perhaps with examples, if you're feeling generous!
Note: this post assumes Python 3.x syntax.†
A generator is simply a function which returns an object on which you can call next, such that for every call it returns some value, until it raises a StopIteration exception, signaling that all values have been generated. Such an object is called an iterator.
Normal functions return a single value using return, just like in Java. In Python, however, there is an alternative, called yield. Using yield anywhere in a function makes it a generator. Observe this code:
>>> def myGen(n):
... yield n
... yield n + 1
...
>>> g = myGen(6)
>>> next(g)
6
>>> next(g)
7
>>> next(g)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
As you can see, myGen(n) is a function which yields n and n + 1. Every call to next yields a single value, until all values have been yielded. for loops call next in the background, thus:
>>> for n in myGen(6):
... print(n)
...
6
7
Likewise there are generator expressions, which provide a means to succinctly describe certain common types of generators:
>>> g = (n for n in range(3, 5))
>>> next(g)
3
>>> next(g)
4
>>> next(g)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Note that generator expressions are much like list comprehensions:
>>> lc = [n for n in range(3, 5)]
>>> lc
[3, 4]
Observe that a generator object is generated once, but its code is not run all at once. Only calls to next actually execute (part of) the code. Execution of the code in a generator stops once a yield statement has been reached, upon which it returns a value. The next call to next then causes execution to continue in the state in which the generator was left after the last yield. This is a fundamental difference with regular functions: those always start execution at the "top" and discard their state upon returning a value.
There are more things to be said about this subject. It is e.g. possible to send data back into a generator (reference). But that is something I suggest you do not look into until you understand the basic concept of a generator.
Now you may ask: why use generators? There are a couple of good reasons:
Certain concepts can be described much more succinctly using generators.
Instead of creating a function which returns a list of values, one can write a generator which generates the values on the fly. This means that no list needs to be constructed, meaning that the resulting code is more memory efficient. In this way one can even describe data streams which would simply be too large to fit in memory.
Generators allow for a natural way to describe infinite streams. Consider for example the Fibonacci numbers:
>>> def fib():
... a, b = 0, 1
... while True:
... yield a
... a, b = b, a + b
...
>>> import itertools
>>> list(itertools.islice(fib(), 10))
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
This code uses itertools.islice to take a finite number of elements from an infinite stream. You are advised to have a good look at the functions in the itertools module, as they are essential tools for writing advanced generators with great ease.
† About Python <=2.6: in the above examples next is a function which calls the method __next__ on the given object. In Python <=2.6 one uses a slightly different technique, namely o.next() instead of next(o). Python 2.7 has next() call .next so you need not use the following in 2.7:
>>> g = (n for n in range(3, 5))
>>> g.next()
3
A generator is effectively a function that returns (data) before it is finished, but it pauses at that point, and you can resume the function at that point.
>>> def myGenerator():
... yield 'These'
... yield 'words'
... yield 'come'
... yield 'one'
... yield 'at'
... yield 'a'
... yield 'time'
>>> myGeneratorInstance = myGenerator()
>>> next(myGeneratorInstance)
These
>>> next(myGeneratorInstance)
words
and so on. The (or one) benefit of generators is that because they deal with data one piece at a time, you can deal with large amounts of data; with lists, excessive memory requirements could become a problem. Generators, just like lists, are iterable, so they can be used in the same ways:
>>> for word in myGeneratorInstance:
... print word
These
words
come
one
at
a
time
Note that generators provide another way to deal with infinity, for example
>>> from time import gmtime, strftime
>>> def myGen():
... while True:
... yield strftime("%a, %d %b %Y %H:%M:%S +0000", gmtime())
>>> myGeneratorInstance = myGen()
>>> next(myGeneratorInstance)
Thu, 28 Jun 2001 14:17:15 +0000
>>> next(myGeneratorInstance)
Thu, 28 Jun 2001 14:18:02 +0000
The generator encapsulates an infinite loop, but this isn't a problem because you only get each answer every time you ask for it.
First of all, the term generator originally was somewhat ill-defined in Python, leading to lots of confusion. You probably mean iterators and iterables (see here). Then in Python there are also generator functions (which return a generator object), generator objects (which are iterators) and generator expressions (which are evaluated to a generator object).
According to the glossary entry for generator it seems that the official terminology is now that generator is short for "generator function". In the past the documentation defined the terms inconsistently, but fortunately this has been fixed.
It might still be a good idea to be precise and avoid the term "generator" without further specification.
Generators could be thought of as shorthand for creating an iterator. They behave like a Java Iterator. Example:
>>> g = (x for x in range(10))
>>> g
<generator object <genexpr> at 0x7fac1c1e6aa0>
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> list(g) # force iterating the rest
[3, 4, 5, 6, 7, 8, 9]
>>> g.next() # iterator is at the end; calling next again will throw
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Hope this helps/is what you are looking for.
Update:
As many other answers are showing, there are different ways to create a generator. You can use the parentheses syntax as in my example above, or you can use yield. Another interesting feature is that generators can be "infinite" -- iterators that don't stop:
>>> def infinite_gen():
... n = 0
... while True:
... yield n
... n = n + 1
...
>>> g = infinite_gen()
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
...
There is no Java equivalent.
Here is a bit of a contrived example:
#! /usr/bin/python
def mygen(n):
x = 0
while x < n:
x = x + 1
if x % 3 == 0:
yield x
for a in mygen(100):
print a
There is a loop in the generator that runs from 0 to n, and if the loop variable is a multiple of 3, it yields the variable.
During each iteration of the for loop the generator is executed. If it is the first time the generator executes, it starts at the beginning, otherwise it continues from the previous time it yielded.
I like to describe generators, to those with a decent background in programming languages and computing, in terms of stack frames.
In many languages, there is a stack on top of which is the current stack "frame". The stack frame includes space allocated for variables local to the function including the arguments passed in to that function.
When you call a function, the current point of execution (the "program counter" or equivalent) is pushed onto the stack, and a new stack frame is created. Execution then transfers to the beginning of the function being called.
With regular functions, at some point the function returns a value, and the stack is "popped". The function's stack frame is discarded and execution resumes at the previous location.
When a function is a generator, it can return a value without the stack frame being discarded, using the yield statement. The values of local variables and the program counter within the function are preserved. This allows the generator to be resumed at a later time, with execution continuing from the yield statement, and it can execute more code and return another value.
Before Python 2.5 this was all generators did. Python 2.5 added the ability to pass values back in to the generator as well. In doing so, the passed-in value is available as an expression resulting from the yield statement which had temporarily returned control (and a value) from the generator.
The key advantage to generators is that the "state" of the function is preserved, unlike with regular functions where each time the stack frame is discarded, you lose all that "state". A secondary advantage is that some of the function call overhead (creating and deleting stack frames) is avoided, though this is a usually a minor advantage.
It helps to make a clear distinction between the function foo, and the generator foo(n):
def foo(n):
yield n
yield n+1
foo is a function.
foo(6) is a generator object.
The typical way to use a generator object is in a loop:
for n in foo(6):
print(n)
The loop prints
# 6
# 7
Think of a generator as a resumable function.
yield behaves like return in the sense that values that are yielded get "returned" by the generator. Unlike return, however, the next time the generator gets asked for a value, the generator's function, foo, resumes where it left off -- after the last yield statement -- and continues to run until it hits another yield statement.
Behind the scenes, when you call bar=foo(6) the generator object bar is defined for you to have a next attribute.
You can call it yourself to retrieve values yielded from foo:
next(bar) # Works in Python 2.6 or Python 3.x
bar.next() # Works in Python 2.5+, but is deprecated. Use next() if possible.
When foo ends (and there are no more yielded values), calling next(bar) throws a StopInteration error.
The only thing I can add to Stephan202's answer is a recommendation that you take a look at David Beazley's PyCon '08 presentation "Generator Tricks for Systems Programmers," which is the best single explanation of the how and why of generators that I've seen anywhere. This is the thing that took me from "Python looks kind of fun" to "This is what I've been looking for." It's at http://www.dabeaz.com/generators/.
This post will use Fibonacci numbers as a tool to build up to explaining the usefulness of Python generators.
This post will feature both C++ and Python code.
Fibonacci numbers are defined as the sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ....
Or in general:
F0 = 0
F1 = 1
Fn = Fn-1 + Fn-2
This can be transferred into a C++ function extremely easily:
size_t Fib(size_t n)
{
//Fib(0) = 0
if(n == 0)
return 0;
//Fib(1) = 1
if(n == 1)
return 1;
//Fib(N) = Fib(N-2) + Fib(N-1)
return Fib(n-2) + Fib(n-1);
}
But if you want to print the first six Fibonacci numbers, you will be recalculating a lot of the values with the above function.
For example: Fib(3) = Fib(2) + Fib(1), but Fib(2) also recalculates Fib(1). The higher the value you want to calculate, the worse off you will be.
So one may be tempted to rewrite the above by keeping track of the state in main.
// Not supported for the first two elements of Fib
size_t GetNextFib(size_t &pp, size_t &p)
{
int result = pp + p;
pp = p;
p = result;
return result;
}
int main(int argc, char *argv[])
{
size_t pp = 0;
size_t p = 1;
std::cout << "0 " << "1 ";
for(size_t i = 0; i <= 4; ++i)
{
size_t fibI = GetNextFib(pp, p);
std::cout << fibI << " ";
}
return 0;
}
But this is very ugly, and it complicates our logic in main. It would be better to not have to worry about state in our main function.
We could return a vector of values and use an iterator to iterate over that set of values, but this requires a lot of memory all at once for a large number of return values.
So back to our old approach, what happens if we wanted to do something else besides print the numbers? We'd have to copy and paste the whole block of code in main and change the output statements to whatever else we wanted to do.
And if you copy and paste code, then you should be shot. You don't want to get shot, do you?
To solve these problems, and to avoid getting shot, we may rewrite this block of code using a callback function. Every time a new Fibonacci number is encountered, we would call the callback function.
void GetFibNumbers(size_t max, void(*FoundNewFibCallback)(size_t))
{
if(max-- == 0) return;
FoundNewFibCallback(0);
if(max-- == 0) return;
FoundNewFibCallback(1);
size_t pp = 0;
size_t p = 1;
for(;;)
{
if(max-- == 0) return;
int result = pp + p;
pp = p;
p = result;
FoundNewFibCallback(result);
}
}
void foundNewFib(size_t fibI)
{
std::cout << fibI << " ";
}
int main(int argc, char *argv[])
{
GetFibNumbers(6, foundNewFib);
return 0;
}
This is clearly an improvement, your logic in main is not as cluttered, and you can do anything you want with the Fibonacci numbers, simply define new callbacks.
But this is still not perfect. What if you wanted to only get the first two Fibonacci numbers, and then do something, then get some more, then do something else?
Well, we could go on like we have been, and we could start adding state again into main, allowing GetFibNumbers to start from an arbitrary point.
But this will further bloat our code, and it already looks too big for a simple task like printing Fibonacci numbers.
We could implement a producer and consumer model via a couple of threads. But this complicates the code even more.
Instead let's talk about generators.
Python has a very nice language feature that solves problems like these called generators.
A generator allows you to execute a function, stop at an arbitrary point, and then continue again where you left off.
Each time returning a value.
Consider the following code that uses a generator:
def fib():
pp, p = 0, 1
while 1:
yield pp
pp, p = p, pp+p
g = fib()
for i in range(6):
g.next()
Which gives us the results:
0
1
1
2
3
5
The yield statement is used in conjuction with Python generators. It saves the state of the function and returns the yeilded value. The next time you call the next() function on the generator, it will continue where the yield left off.
This is by far more clean than the callback function code. We have cleaner code, smaller code, and not to mention much more functional code (Python allows arbitrarily large integers).
Source
I believe the first appearance of iterators and generators were in the Icon programming language, about 20 years ago.
You may enjoy the Icon overview, which lets you wrap your head around them without concentrating on the syntax (since Icon is a language you probably don't know, and Griswold was explaining the benefits of his language to people coming from other languages).
After reading just a few paragraphs there, the utility of generators and iterators might become more apparent.
I put up this piece of code which explains 3 key concepts about generators:
def numbers():
for i in range(10):
yield i
gen = numbers() #this line only returns a generator object, it does not run the code defined inside numbers
for i in gen: #we iterate over the generator and the values are printed
print(i)
#the generator is now empty
for i in gen: #so this for block does not print anything
print(i)
Performance difference:
macOS Big Sur 11.1
MacBook Pro (13-inch, M1, 2020)
Chip Apple M1
Memory 8gb
CASE 1
import random
import psutil # pip install psutil
import os
from datetime import datetime
def memory_usage_psutil():
# return the memory usage in MB
process = psutil.Process(os.getpid())
mem = process.memory_info().rss / float(2 ** 20)
return '{:.2f} MB'.format(mem)
names = ['John', 'Milovan', 'Adam', 'Steve', 'Rick', 'Thomas']
majors = ['Math', 'Engineering', 'CompSci', 'Arts', 'Business']
print('Memory (Before): {}'.format(memory_usage_psutil()))
def people_list(num_people):
result = []
for i in range(num_people):
person = {
'id': i,
'name': random.choice(names),
'major': random.choice(majors)
}
result.append(person)
return result
t1 = datetime.now()
people = people_list(1000000)
t2 = datetime.now()
print('Memory (After) : {}'.format(memory_usage_psutil()))
print('Took {} Seconds'.format(t2 - t1))
output:
Memory (Before): 50.38 MB
Memory (After) : 1140.41 MB
Took 0:00:01.056423 Seconds
Function which returns a list of 1 million results.
At the bottom I'm printing out the memory usage and the total time.
Base memory usage was around 50.38 megabytes and this memory after is after I created that list of 1 million records so you can see here that it jumped up by nearly 1140.41 megabytes and it took 1,1 seconds.
CASE 2
import random
import psutil # pip install psutil
import os
from datetime import datetime
def memory_usage_psutil():
# return the memory usage in MB
process = psutil.Process(os.getpid())
mem = process.memory_info().rss / float(2 ** 20)
return '{:.2f} MB'.format(mem)
names = ['John', 'Milovan', 'Adam', 'Steve', 'Rick', 'Thomas']
majors = ['Math', 'Engineering', 'CompSci', 'Arts', 'Business']
print('Memory (Before): {}'.format(memory_usage_psutil()))
def people_generator(num_people):
for i in range(num_people):
person = {
'id': i,
'name': random.choice(names),
'major': random.choice(majors)
}
yield person
t1 = datetime.now()
people = people_generator(1000000)
t2 = datetime.now()
print('Memory (After) : {}'.format(memory_usage_psutil()))
print('Took {} Seconds'.format(t2 - t1))
output:
Memory (Before): 50.52 MB
Memory (After) : 50.73 MB
Took 0:00:00.000008 Seconds
After I ran this that the memory is almost exactly the same and that's because the generator hasn't actually done anything yet it's not holding those million values in memory it's waiting for me to grab the next one.
Basically it didn't take any time because as soon as it gets to the first yield statement it stops.
I think that it is generator a little bit more readable and it also gives you big performance boosts not only with execution time but with memory.
As well and you can still use all of the comprehensions and this generator expression here so you don't lose anything in that area. So those are a few reasons why you would use generators and also some of the advantages that come along with that.
Experience with list comprehensions has shown their widespread utility throughout Python. However, many of the use cases do not need to have a full list created in memory. Instead, they only need to iterate over the elements one at a time.
For instance, the following summation code will build a full list of squares in memory, iterate over those values, and, when the reference is no longer needed, delete the list:
sum([x*x for x in range(10)])
Memory is conserved by using a generator expression instead:
sum(x*x for x in range(10))
Similar benefits are conferred on constructors for container objects:
s = Set(word for line in page for word in line.split())
d = dict( (k, func(k)) for k in keylist)
Generator expressions are especially useful with functions like sum(), min(), and max() that reduce an iterable input to a single value:
max(len(line) for line in file if line.strip())
more

Why can't you toggle a function generator's behavior by an argument?

Consider these two functions:
def foo():
x = 0
while True:
yield x
x += 1
def wrap_foo(limit=10, gen=True):
fg = foo()
count = 0
if gen:
while count < limit:
yield next(fg)
count += 1
else:
return [next(fg) for _ in range(limit)]=
foo() is a generator, and wrap_foo() just puts a limit on how much data gets generated. I was experimenting with having the wrapper behave as a generator with gen=True, or as a regular function that puts all generated data into memory directly with the kwarg gen=False.
The regular generator behavior works as I'd expect:
In [1352]: [_ for _ in wrap_foo(gen=True)]
Out[1352]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
However, with gen=False, nothing gets generated.
In [1351]: [num for num in wrap_foo(gen=False)]
Out[1351]: []
It seems like Python pre-classifies the function as a generator based on the presence of the yield statement (latter example works perfectly if yield is commented out).
Why is this? I would like to understand the mechanisms at play here. I'm running 3.6
It seems like Python pre-classifies the function as a generator based on the presence of the yield statement
Yes, that's exactly what happens. wrap_foo is determined to be a generator at function definition time. You could consider using generator expressions instead:
def wrap_foo(limit=10, gen=True):
fg = foo()
if gen:
return (next(fg) for _ in range(limit))
else:
return [next(fg) for _ in range(limit)]
It seems like Python pre-classifies the function as a generator based
on the presence of the yield statement (latter example works perfectly
if yield is commented out).
Why is this?
Because Python can't wait until the function actually executes a yield to decide whether it's a generator. First, generators are defined to not execute any of their code until the first next. Second, a generator might never actually reach any of its yield statements, if it happens to not generate any elements.

What is a "yield" statement in a function? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
The Python yield keyword explained
Can someone explain to me what the yield statement actually does in this bit of code here:
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a+b
for number in fibonacci(): # Use the generator as an iterator; print number
What I understand so far is, we are defining a function finonacci(), with no parameters?
inside the function we are defining a and b equal to 0 and 1, next, while this is true, we are yielding a. What is this actually doing? Furthermore, while yielding a? a is now equal to b, while b is now equal to a + b.
Next question, for number in fibonacci(), does this mean for every number in the function or what? I'm equally stumped on what yield and 'for number' are actually doing. Obviously I am aware that it means for every number in fibonacci() print number. Am I actually defining number without knowing it?
Thanks, sorry if I'm not clear. BTW, it's for project Euler, if I knew how to program well this would be a breeze but I'm trying to learn this on the fly.
Using yield makes the function a generator.
The generator will continue to yield the a variable on each loop, waiting until the generator's next() method is called to continue on to the next loop iteration.
Or, until you return or StopIteration is raised.
Slightly modified to show use of StopIteration:
>>> def fib():
... a = 0
... b = 1
... while True:
... yield a
... a = b
... b += a
... if a > 100:
... raise StopIteration
...
>>>
>>> for value in fib():
... print value
...
0
1
2
4
8
16
32
64
>>>
>>> # assign the resulting object to 'generator'
>>> generator = fib()
>>> generator.next()
0
>>> generator.next()
1
>>> for value in generator:
... print value
...
2
4
8
16
32
64
>>>
Generators have a special property of being iterables which do not consume memories for their values.
They do this by calculating the new value, when it is required while being iterated.
i.e.
def f():
a = 2
yield a
a += 1
for ele in f():
print ele
would print
2
So you are using a function as an iterable that keeps returning values.
This is especially useful when you require heavy memory usage, and so you cannot afford the use of a list comprehension
i.e.
li = [ele*10 for ele in range(10)]
takes 10 memory spaces for ints as a list
but if you simple want to iterate over it, not access it individually
it would be very memory efficient to instead use
def f():
i=0
while i<10
yield i*10
i += 1
which would use 1 memory space as i keeps being reused
a short cut for this is
ge = (i*10 for i in range(10))
you can do any of the following
for ele in f():
for ele in li:
for ele in ge:
to obtain equivalent results
When the code calls fibonacci a special generator object is created. Please note, that no code gets executed - only a generator object is returned. When you are later calling its next method, the function executes until it encounters a yield statement. The object that is supplied to yield is returned. When you call next method again the function executes again until it encounters a yield. When there are no more yield statements and the end of function is reached, a StopIteration exception is raised.
Please note that the objects inside the function are preserved between the calls to next. It means, when the code continues execution on the next loop, all the objects that were in the scope from which yield was called have their values from the point where a previous next call returned.
The cool thing about generators is that they allow convenient iteration with for loops.
The for loop obtains a generator from the result of fibonacci call and then executes the loop retrieving elements using next method of generatior object until StopIteration exception is encountered.
This answer is a great explanation of the yield statement, and also of iterators and generators.
Specifically here, the first call to fibonaci() will initialize a to 0, b to 1, enter the while loop and return a.
Any next call will start after the yield statement, affect b to a, a+b to b, and then go to the next iteration of the while statement, reach again the yield statement, and return a again.

Hidden features of Python [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
What are the lesser-known but useful features of the Python programming language?
Try to limit answers to Python core.
One feature per answer.
Give an example and short description of the feature, not just a link to documentation.
Label the feature using a title as the first line.
Quick links to answers:
Argument Unpacking
Braces
Chaining Comparison Operators
Decorators
Default Argument Gotchas / Dangers of Mutable Default arguments
Descriptors
Dictionary default .get value
Docstring Tests
Ellipsis Slicing Syntax
Enumeration
For/else
Function as iter() argument
Generator expressions
import this
In Place Value Swapping
List stepping
__missing__ items
Multi-line Regex
Named string formatting
Nested list/generator comprehensions
New types at runtime
.pth files
ROT13 Encoding
Regex Debugging
Sending to Generators
Tab Completion in Interactive Interpreter
Ternary Expression
try/except/else
Unpacking+print() function
with statement
Chaining comparison operators:
>>> x = 5
>>> 1 < x < 10
True
>>> 10 < x < 20
False
>>> x < 10 < x*10 < 100
True
>>> 10 > x <= 9
True
>>> 5 == x > 4
True
In case you're thinking it's doing 1 < x, which comes out as True, and then comparing True < 10, which is also True, then no, that's really not what happens (see the last example.) It's really translating into 1 < x and x < 10, and x < 10 and 10 < x * 10 and x*10 < 100, but with less typing and each term is only evaluated once.
Get the python regex parse tree to debug your regex.
Regular expressions are a great feature of python, but debugging them can be a pain, and it's all too easy to get a regex wrong.
Fortunately, python can print the regex parse tree, by passing the undocumented, experimental, hidden flag re.DEBUG (actually, 128) to re.compile.
>>> re.compile("^\[font(?:=(?P<size>[-+][0-9]{1,2}))?\](.*?)[/font]",
re.DEBUG)
at at_beginning
literal 91
literal 102
literal 111
literal 110
literal 116
max_repeat 0 1
subpattern None
literal 61
subpattern 1
in
literal 45
literal 43
max_repeat 1 2
in
range (48, 57)
literal 93
subpattern 2
min_repeat 0 65535
any None
in
literal 47
literal 102
literal 111
literal 110
literal 116
Once you understand the syntax, you can spot your errors. There we can see that I forgot to escape the [] in [/font].
Of course you can combine it with whatever flags you want, like commented regexes:
>>> re.compile("""
^ # start of a line
\[font # the font tag
(?:=(?P<size> # optional [font=+size]
[-+][0-9]{1,2} # size specification
))?
\] # end of tag
(.*?) # text between the tags
\[/font\] # end of the tag
""", re.DEBUG|re.VERBOSE|re.DOTALL)
enumerate
Wrap an iterable with enumerate and it will yield the item along with its index.
For example:
>>> a = ['a', 'b', 'c', 'd', 'e']
>>> for index, item in enumerate(a): print index, item
...
0 a
1 b
2 c
3 d
4 e
>>>
References:
Python tutorial—looping techniques
Python docs—built-in functions—enumerate
PEP 279
Creating generators objects
If you write
x=(n for n in foo if bar(n))
you can get out the generator and assign it to x. Now it means you can do
for n in x:
The advantage of this is that you don't need intermediate storage, which you would need if you did
x = [n for n in foo if bar(n)]
In some cases this can lead to significant speed up.
You can append many if statements to the end of the generator, basically replicating nested for loops:
>>> n = ((a,b) for a in range(0,2) for b in range(4,6))
>>> for i in n:
... print i
(0, 4)
(0, 5)
(1, 4)
(1, 5)
iter() can take a callable argument
For instance:
def seek_next_line(f):
for c in iter(lambda: f.read(1),'\n'):
pass
The iter(callable, until_value) function repeatedly calls callable and yields its result until until_value is returned.
Be careful with mutable default arguments
>>> def foo(x=[]):
... x.append(1)
... print x
...
>>> foo()
[1]
>>> foo()
[1, 1]
>>> foo()
[1, 1, 1]
Instead, you should use a sentinel value denoting "not given" and replace with the mutable you'd like as default:
>>> def foo(x=None):
... if x is None:
... x = []
... x.append(1)
... print x
>>> foo()
[1]
>>> foo()
[1]
Sending values into generator functions. For example having this function:
def mygen():
"""Yield 5 until something else is passed back via send()"""
a = 5
while True:
f = (yield a) #yield a and possibly get f in return
if f is not None:
a = f #store the new value
You can:
>>> g = mygen()
>>> g.next()
5
>>> g.next()
5
>>> g.send(7) #we send this back to the generator
7
>>> g.next() #now it will yield 7 until we send something else
7
If you don't like using whitespace to denote scopes, you can use the C-style {} by issuing:
from __future__ import braces
The step argument in slice operators. For example:
a = [1,2,3,4,5]
>>> a[::2] # iterate over the whole list in 2-increments
[1,3,5]
The special case x[::-1] is a useful idiom for 'x reversed'.
>>> a[::-1]
[5,4,3,2,1]
Decorators
Decorators allow to wrap a function or method in another function that can add functionality, modify arguments or results, etc. You write decorators one line above the function definition, beginning with an "at" sign (#).
Example shows a print_args decorator that prints the decorated function's arguments before calling it:
>>> def print_args(function):
>>> def wrapper(*args, **kwargs):
>>> print 'Arguments:', args, kwargs
>>> return function(*args, **kwargs)
>>> return wrapper
>>> #print_args
>>> def write(text):
>>> print text
>>> write('foo')
Arguments: ('foo',) {}
foo
The for...else syntax (see http://docs.python.org/ref/for.html )
for i in foo:
if i == 0:
break
else:
print("i was never 0")
The "else" block will be normally executed at the end of the for loop, unless the break is called.
The above code could be emulated as follows:
found = False
for i in foo:
if i == 0:
found = True
break
if not found:
print("i was never 0")
From 2.5 onwards dicts have a special method __missing__ that is invoked for missing items:
>>> class MyDict(dict):
... def __missing__(self, key):
... self[key] = rv = []
... return rv
...
>>> m = MyDict()
>>> m["foo"].append(1)
>>> m["foo"].append(2)
>>> dict(m)
{'foo': [1, 2]}
There is also a dict subclass in collections called defaultdict that does pretty much the same but calls a function without arguments for not existing items:
>>> from collections import defaultdict
>>> m = defaultdict(list)
>>> m["foo"].append(1)
>>> m["foo"].append(2)
>>> dict(m)
{'foo': [1, 2]}
I recommend converting such dicts to regular dicts before passing them to functions that don't expect such subclasses. A lot of code uses d[a_key] and catches KeyErrors to check if an item exists which would add a new item to the dict.
In-place value swapping
>>> a = 10
>>> b = 5
>>> a, b
(10, 5)
>>> a, b = b, a
>>> a, b
(5, 10)
The right-hand side of the assignment is an expression that creates a new tuple. The left-hand side of the assignment immediately unpacks that (unreferenced) tuple to the names a and b.
After the assignment, the new tuple is unreferenced and marked for garbage collection, and the values bound to a and b have been swapped.
As noted in the Python tutorial section on data structures,
Note that multiple assignment is really just a combination of tuple packing and sequence unpacking.
Readable regular expressions
In Python you can split a regular expression over multiple lines, name your matches and insert comments.
Example verbose syntax (from Dive into Python):
>>> pattern = """
... ^ # beginning of string
... M{0,4} # thousands - 0 to 4 M's
... (CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
... # or 500-800 (D, followed by 0 to 3 C's)
... (XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
... # or 50-80 (L, followed by 0 to 3 X's)
... (IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
... # or 5-8 (V, followed by 0 to 3 I's)
... $ # end of string
... """
>>> re.search(pattern, 'M', re.VERBOSE)
Example naming matches (from Regular Expression HOWTO)
>>> p = re.compile(r'(?P<word>\b\w+\b)')
>>> m = p.search( '(((( Lots of punctuation )))' )
>>> m.group('word')
'Lots'
You can also verbosely write a regex without using re.VERBOSE thanks to string literal concatenation.
>>> pattern = (
... "^" # beginning of string
... "M{0,4}" # thousands - 0 to 4 M's
... "(CM|CD|D?C{0,3})" # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
... # or 500-800 (D, followed by 0 to 3 C's)
... "(XC|XL|L?X{0,3})" # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
... # or 50-80 (L, followed by 0 to 3 X's)
... "(IX|IV|V?I{0,3})" # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
... # or 5-8 (V, followed by 0 to 3 I's)
... "$" # end of string
... )
>>> print pattern
"^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$"
Function argument unpacking
You can unpack a list or a dictionary as function arguments using * and **.
For example:
def draw_point(x, y):
# do some magic
point_foo = (3, 4)
point_bar = {'y': 3, 'x': 2}
draw_point(*point_foo)
draw_point(**point_bar)
Very useful shortcut since lists, tuples and dicts are widely used as containers.
ROT13 is a valid encoding for source code, when you use the right coding declaration at the top of the code file:
#!/usr/bin/env python
# -*- coding: rot13 -*-
cevag "Uryyb fgnpxbiresybj!".rapbqr("rot13")
Creating new types in a fully dynamic manner
>>> NewType = type("NewType", (object,), {"x": "hello"})
>>> n = NewType()
>>> n.x
"hello"
which is exactly the same as
>>> class NewType(object):
>>> x = "hello"
>>> n = NewType()
>>> n.x
"hello"
Probably not the most useful thing, but nice to know.
Edit: Fixed name of new type, should be NewType to be the exact same thing as with class statement.
Edit: Adjusted the title to more accurately describe the feature.
Context managers and the "with" Statement
Introduced in PEP 343, a context manager is an object that acts as a run-time context for a suite of statements.
Since the feature makes use of new keywords, it is introduced gradually: it is available in Python 2.5 via the __future__ directive. Python 2.6 and above (including Python 3) has it available by default.
I have used the "with" statement a lot because I think it's a very useful construct, here is a quick demo:
from __future__ import with_statement
with open('foo.txt', 'w') as f:
f.write('hello!')
What's happening here behind the scenes, is that the "with" statement calls the special __enter__ and __exit__ methods on the file object. Exception details are also passed to __exit__ if any exception was raised from the with statement body, allowing for exception handling to happen there.
What this does for you in this particular case is that it guarantees that the file is closed when execution falls out of scope of the with suite, regardless if that occurs normally or whether an exception was thrown. It is basically a way of abstracting away common exception-handling code.
Other common use cases for this include locking with threads and database transactions.
Dictionaries have a get() method
Dictionaries have a 'get()' method. If you do d['key'] and key isn't there, you get an exception. If you do d.get('key'), you get back None if 'key' isn't there. You can add a second argument to get that item back instead of None, eg: d.get('key', 0).
It's great for things like adding up numbers:
sum[value] = sum.get(value, 0) + 1
Descriptors
They're the magic behind a whole bunch of core Python features.
When you use dotted access to look up a member (eg, x.y), Python first looks for the member in the instance dictionary. If it's not found, it looks for it in the class dictionary. If it finds it in the class dictionary, and the object implements the descriptor protocol, instead of just returning it, Python executes it. A descriptor is any class that implements the __get__, __set__, or __delete__ methods.
Here's how you'd implement your own (read-only) version of property using descriptors:
class Property(object):
def __init__(self, fget):
self.fget = fget
def __get__(self, obj, type):
if obj is None:
return self
return self.fget(obj)
and you'd use it just like the built-in property():
class MyClass(object):
#Property
def foo(self):
return "Foo!"
Descriptors are used in Python to implement properties, bound methods, static methods, class methods and slots, amongst other things. Understanding them makes it easy to see why a lot of things that previously looked like Python 'quirks' are the way they are.
Raymond Hettinger has an excellent tutorial that does a much better job of describing them than I do.
Conditional Assignment
x = 3 if (y == 1) else 2
It does exactly what it sounds like: "assign 3 to x if y is 1, otherwise assign 2 to x". Note that the parens are not necessary, but I like them for readability. You can also chain it if you have something more complicated:
x = 3 if (y == 1) else 2 if (y == -1) else 1
Though at a certain point, it goes a little too far.
Note that you can use if ... else in any expression. For example:
(func1 if y == 1 else func2)(arg1, arg2)
Here func1 will be called if y is 1 and func2, otherwise. In both cases the corresponding function will be called with arguments arg1 and arg2.
Analogously, the following is also valid:
x = (class1 if y == 1 else class2)(arg1, arg2)
where class1 and class2 are two classes.
Doctest: documentation and unit-testing at the same time.
Example extracted from the Python documentation:
def factorial(n):
"""Return the factorial of n, an exact integer >= 0.
If the result is small enough to fit in an int, return an int.
Else return a long.
>>> [factorial(n) for n in range(6)]
[1, 1, 2, 6, 24, 120]
>>> factorial(-1)
Traceback (most recent call last):
...
ValueError: n must be >= 0
Factorials of floats are OK, but the float must be an exact integer:
"""
import math
if not n >= 0:
raise ValueError("n must be >= 0")
if math.floor(n) != n:
raise ValueError("n must be exact integer")
if n+1 == n: # catch a value like 1e300
raise OverflowError("n too large")
result = 1
factor = 2
while factor <= n:
result *= factor
factor += 1
return result
def _test():
import doctest
doctest.testmod()
if __name__ == "__main__":
_test()
Named formatting
% -formatting takes a dictionary (also applies %i/%s etc. validation).
>>> print "The %(foo)s is %(bar)i." % {'foo': 'answer', 'bar':42}
The answer is 42.
>>> foo, bar = 'question', 123
>>> print "The %(foo)s is %(bar)i." % locals()
The question is 123.
And since locals() is also a dictionary, you can simply pass that as a dict and have % -substitions from your local variables. I think this is frowned upon, but simplifies things..
New Style Formatting
>>> print("The {foo} is {bar}".format(foo='answer', bar=42))
To add more python modules (espcially 3rd party ones), most people seem to use PYTHONPATH environment variables or they add symlinks or directories in their site-packages directories. Another way, is to use *.pth files. Here's the official python doc's explanation:
"The most convenient way [to modify
python's search path] is to add a path
configuration file to a directory
that's already on Python's path,
usually to the .../site-packages/
directory. Path configuration files
have an extension of .pth, and each
line must contain a single path that
will be appended to sys.path. (Because
the new paths are appended to
sys.path, modules in the added
directories will not override standard
modules. This means you can't use this
mechanism for installing fixed
versions of standard modules.)"
Exception else clause:
try:
put_4000000000_volts_through_it(parrot)
except Voom:
print "'E's pining!"
else:
print "This parrot is no more!"
finally:
end_sketch()
The use of the else clause is better than adding additional code to the try clause because it avoids accidentally catching an exception that wasn’t raised by the code being protected by the try ... except statement.
See http://docs.python.org/tut/node10.html
Re-raising exceptions:
# Python 2 syntax
try:
some_operation()
except SomeError, e:
if is_fatal(e):
raise
handle_nonfatal(e)
# Python 3 syntax
try:
some_operation()
except SomeError as e:
if is_fatal(e):
raise
handle_nonfatal(e)
The 'raise' statement with no arguments inside an error handler tells Python to re-raise the exception with the original traceback intact, allowing you to say "oh, sorry, sorry, I didn't mean to catch that, sorry, sorry."
If you wish to print, store or fiddle with the original traceback, you can get it with sys.exc_info(), and printing it like Python would is done with the 'traceback' module.
Main messages :)
import this
# btw look at this module's source :)
De-cyphered:
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Interactive Interpreter Tab Completion
try:
import readline
except ImportError:
print "Unable to load readline module."
else:
import rlcompleter
readline.parse_and_bind("tab: complete")
>>> class myclass:
... def function(self):
... print "my function"
...
>>> class_instance = myclass()
>>> class_instance.<TAB>
class_instance.__class__ class_instance.__module__
class_instance.__doc__ class_instance.function
>>> class_instance.f<TAB>unction()
You will also have to set a PYTHONSTARTUP environment variable.
Nested list comprehensions and generator expressions:
[(i,j) for i in range(3) for j in range(i) ]
((i,j) for i in range(4) for j in range(i) )
These can replace huge chunks of nested-loop code.
Operator overloading for the set builtin:
>>> a = set([1,2,3,4])
>>> b = set([3,4,5,6])
>>> a | b # Union
{1, 2, 3, 4, 5, 6}
>>> a & b # Intersection
{3, 4}
>>> a < b # Subset
False
>>> a - b # Difference
{1, 2}
>>> a ^ b # Symmetric Difference
{1, 2, 5, 6}
More detail from the standard library reference: Set Types

Categories

Resources