Strange behavior of generator from function - python

While playing with generators, I found interesting thing. When I defined a function with yield keyword, received a generator from it, I also deleted the variable with sequence, which was fed to function. And *POOF!* - generator become empty. Here's my steps:
>>> def foo(list_):
... for i in list_:
... yield i*2
...
>>> val = [1, 2, 3, 4]
>>> gen = foo(val)
>>> print(tuple(gen))
(2, 4, 6, 8)
>>> del val
>>> print(tuple(gen))
()
>>>
Shouldn't it be immutable? Or, if it actually works as object which feeds all values of it's variable to it's function, giving the output, why it didn't throws an exception due to absence of linked sequence? Actually, that example can be explained as if I iterate over a empty sequence, which results that block for _ in []: wouldn't ever start. But, I cant explain why this does not throw an exception:
>>> def foo(list_):
... for i in list_:
... yield i*2
...
>>> val = [1, 2, 3, 4]
>>> gen = foo(val)
>>> print(tuple(gen))
(2, 4, 6, 8)
>>> del foo
>>> print(tuple(gen))
()
>>>
Are generators act here similar to dict.get() function? I don't understand this.

This has nothing to do with objects being deleted. Instead, you have exhausted the generator; generators can only be iterated over once. Create a new generator from the generator function if you need to iterate a second time. You see the same behaviour without deleting references:
>>> def foo(list_):
... for i in list_:
... yield i*2
...
>>> val = [1, 2, 3, 4]
>>> gen = foo(val)
>>> list(gen)
[2, 4, 6, 8]
>>> list(gen) # gen is now exhausted, so no more elements are produced
[]
>>> gen = foo(val) # new generator
>>> list(gen)
[2, 4, 6, 8]
Note that del foo or del val only deletes the reference to an object. It wouldn't delete the function or list object altogether if there are other references to it, like from an existing generator created from the function. So gen = foo(val); del foo, val won't break the gen object, it can still produce values without either the foo or val references existing, because gen itself still references what it needs to complete:
>>> gen = foo(val)
>>> del foo, val
>>> list(gen) # gen still references the list and code objects
[2, 4, 6, 8]

Related

generating tuple vs list [duplicate]

This question already has answers here:
Why is there no tuple comprehension in Python?
(13 answers)
Closed 1 year ago.
When generating list we do not use the builtin "list" to specify that it is a list just the "[]" would do it.
But when the same style/pattern in used for tuple it does not.
l = [x for x in range(8)]
print(l)
y= ((x for x in range(8)))
print(y)
Output:
[0, 1, 2, 3, 4, 5, 6, 7]
<generator object <genexpr> at 0x000001D1DB7696D0>
Process finished with exit code 0
When "tuple" is specified it displays it right.
Question is:- In the code "list" is not explicitly mentioned but "tuple". Could you tell me why?
l = [x for x in range(8)]
print(l)
y= tuple((x for x in range(8)))
print(y)
Output:
[0, 1, 2, 3, 4, 5, 6, 7]
(0, 1, 2, 3, 4, 5, 6, 7)
Process finished with exit code 0
Using () is a generator expression:
>>> ((x for x in range(8)))
<generator object <genexpr> at 0x0000013FE1AD6040>
>>>
As mentioned in the documentation:
Generator iterators are created by the yield keyword. The real difference between them and ordinary functions is that yield unlike return is both exit and entry point for the function’s body. That means, after each yield call not only the generator returns something but also remembers its state. Calling the next() method brings control back to the generator starting after the last executed yield statement. Each yield statement is executed only once, in the order it appears in the code. After all the yield statements have been executed iteration ends.
A generator in a class would be something like:
class Generator:
def __init__(self, lst):
self.lst = lst
def __iter__(self):
it = iter(self.lst)
yield from it
def __next__(self):
it = iter(self.lst)
return next(it)
Usage:
>>> x = Generator(i for i in range(5))
>>> next(x)
0
>>> next(x)
1
>>> next(x)
2
>>> for i in x:
print(i)
3
4
>>>

Why does popping from the original list make reversed(original_list) empty?

I have the following code:
s = [1,2,3]
t = reversed(s)
for i in t:
print(i)
# output: 3,2,1
If I pop one element from s (original), then the t (reversed) is emptied:
s = [1,2,3]
t = reversed(s)
s.pop()
for i in t:
print(i)
# expected output: 2, 1
# actual output (nothing):
Why does this happen?
Taking a look at the cpython code on GitHub, we can get some intuition as to why it no longer works.
The iterator that is returned essentially requires knowing the position of the last index and the length of the array. If the size of the array is changed, the iterator will no longer work.
Test 1: Increasing the array length
This will not produce the correct results either, but the iterator does run:
s = [1,2,3]
t = reversed(s)
s.append(4)
for i in t:
print(i)
# output: [3, 2, 1]
Test 2: Decreasing, then increasing the length
s = [1,2,3]
t = reversed(s)
s.pop()
s.append(4)
for i in t:
print(i)
# output: [4, 2, 1]
It still works!
So there's an internal check to see whether or not the last index is still valid, and if it is, it's a simple for loop down to index 0.
If it doesn't work, the iterator returns empty.
calling reversed return a iterator over that list, which a special object that allow you iterate in reverse order over the original list, is not a new list and is a one time use only
>>> s= [1,2,3]
>>> t = reversed(s)
>>> t
<list_reverseiterator object at 0x00000261BE8F0C40>
>>> list(t)
[3, 2, 1]
>>> list(t)
[]
>>>
and because this iterator reference the original list, any change on it is reflected when you iterate over the iterator later.
Update
In particular and as MZ explain, if that change is such that the state of the list is different from when the iterator was created you get nothing if the size decreases or an incomplete version of the list if increased
>>> s= [1,2,3]
>>> t = reversed(s)
>>> s.insert(0,23)
>>> s
[23, 1, 2, 3]
>>> list(t)
[2, 1, 23]
>>> t = reversed(s)
>>> s.append(32)
>>> list(t)
[3, 2, 1, 23]
>>> s
[23, 1, 2, 3, 32]
>>> t = reversed(s)
>>> s.pop()
32
>>> list(t)
[]
>>>

How to understand scope within generators?

I seem to misunderstand variable scope within generators. Why do the following two results differ? (Note the second use of tuple when producing the second result.)
def f(x): return x
result1 = tuple(itertools.chain(*((f(val) for _ in range(2)) for val in (1,2,3))))
result2 = tuple(itertools.chain(*(tuple(f(val) for _ in range(2)) for val in (1,2,3))))
print(result1==result2) # False; why?
Fundamentally, scope is working like it always works. Generators just create a local, enclosing scope, just like a function. Essentially, you are creating a closure over val, and in Python, closures are lexically-scoped and late-binding, i.e., their value is evaluated at the point of executing not definition.
The difference between the two is when the outer generator get's iterated over versus the inner generator. In your first example, the outer generator is iterated completely before any of the inner generators are, in the second example, tuple forces them to be evaluated in-tandem.
The problem is that when you use * argument splatting, it immediately evaluates your generator (the outer one), however, the inner generator isn't evaluated yet, but it is closed over val, but val = 3 at the end of the first generator.
But, in your second example,
(tuple(f(val) for _ in range(2)) for val in (1,2,3)))
The inner call to tuple forces f to be called when val is 1, 2, and 3, and thus, f captures those values.
So, consider the following nested generator, and two different ways of iterating over them:
>>> def gen():
... for i in range(3):
... yield (i for _ in range(2))
...
>>> data = list(gen()) # essentially what you are doing with the splatting
>>> for item in data:
... print(list(item))
...
[2, 2]
[2, 2]
[2, 2]
>>> for item in gen():
... print(list(item))
...
[0, 0]
[1, 1]
[2, 2]
>>>
And finally, this should also be informative:
>>> gs = []
>>> for item in gen():
... gs.append(item)
...
>>> gs
[<generator object gen.<locals>.<genexpr> at 0x1041ceba0>, <generator object gen.<locals>.<genexpr> at 0x1041cecf0>, <generator object gen.<locals>.<genexpr> at 0x1041fc200>]
>>> [list(g) for g in gs]
[[2, 2], [2, 2], [2, 2]]
Again, you have to think of what the closure value will be when it actually is evaluated, in the above case, since I've already iterated over the outer generator, so i is 2, and simply appended the inner generators to another list, and then I evaluate the inner generators, they will see the value of i as 2, because that is what it is.
To reiterate, this occurs because * splatting force the generator to be iterated over. use chain.from_iterable instead and you'll get True for your result1 == result2.

Are there efficiency differences in extend vs. adding vs. appending in Python?

I refer to list operations:
L = myList + otherList
L = myList.append([5])
L = myList.extend(otherList)
I am curious if there are efficiency differences among these operations.
These are totally different operations.
They have different purposes, so efficiency wouldn't matter. append is used to append a single value to a list, extend is used for multiple values, and the addition is for when you don't want to modify the original list, but to have another list with the extra values added on.
>>> lst = [1, 2, 3]
>>> lst2 = [5, 6]
>>> lst.append(4) # appending
>>> lst
[1, 2, 3, 4]
>>> lst.extend(lst2) # extending
>>> lst
[1, 2, 3, 4, 5, 6]
>>> lst + lst2 # addition
[1, 2, 3, 4, 5, 6, 5, 6]
Also note that list.append and list.extend operate in-place, so assigning the result to a variable will make that variable hold the value None.
Your example here is sort of misleading in the case of append.
>>> l1 = [1,2,3,4]
>>> l1.append([5])
>>> l1
[1, 2, 3, 4, [5]]
Append takes a single item and appends it to the end of the existing list. By passing in an iterable to append, you're adding another list (in this case) within a list.
extend takes an iterable and essentially calls append for each item in the iterable`, adding the items onto the end of the existing list.
The mylist + otherlist is the only interesting case here, as the result of using the + operator creates a new list, using more memory.
Timing them answers your question about efficiency in regards of speed:
import timeit
def first():
mylist + otherlist
def second():
mylist.append(otherlist)
def third():
mylist.extend(otherlist)
for test in (first, second, third):
mylist = [1, 2, 3, 4]
otherlist = [5]
print "%s: %f" % (test, timeit.timeit(test, number=1000000))
On my machine the result was:
<function first at 0x10ff3ba28>: 0.320835
<function second at 0x10ff3baa0>: 0.275077
<function third at 0x10ff3bb18>: 0.284508
Showing that the first example was clearly slowest.

reverse method mutating input

For an assignment we were asked to create a function that would reverse all the elements in an arbitrarily nested list. So inputs to the function should return something like this:
>>> seq = [1,[2,[3]]]
>>> print arb_reverse(seq)
[[[3],2],1]
>>> seq = [9,[[],[0,1,[[],[2,[[],3]]]],[],[[[4],5]]]]
>>> print arb_reverse(seq)
[[[[5,[4]]],[],[[[[3,[]],2],[]],1,0],[]],9]
I came up with a recursive solution which works well:
def arb_reverse(seq):
result = []
for element in reversed(seq):
if not is_list(element):
result.append(element)
else:
result.append(arb_reverse(element))
return result
But for a bit of a personal challenge I wanted to create a solution without the use of recursion. One version of this attempt resulted in some curious behavior which I am not understanding. For clarification, I was NOT expecting this version to work properly but the resulting input mutation does not make sense. Here is the iterative version in question:
def arb_reverse(seq):
elements = list(seq) #so input is not mutated, also tried seq[:] just to be thorough
result = []
while elements:
item = elements.pop()
if isinstance(item, list):
item.reverse() #this operation seems to be the culprit
elements += item
else:
result.append(item)
return result
This returns a flattened semi-reversed list (somewhat expected), but the interesting part is what it does to the input (not expected)...
>>> a = [1, [2, [3]]]
>>> arb_reverse(a)
[2, 3, 1]
>>> a
[1, [[3], 2]]
>>> p = [1, [2, 3, [4, [5, 6]]]]
>>> print arb_reverse(p)
[2, 3, 4, 5, 6, 1]
>>> print p
[1, [[[6, 5], 4], 3, 2]]
I was under the impression that by passing the values contained in the input to a variable using list() or input[:] as i did with elements, that I would avoid mutating the input. However, a few print statements later revealed that the reverse method had a hand in mutating the original list. Why is that?
The list() call is making a new list with shallow-copied lists from the original.
Try this (stolen from here):
from copy import deepcopy
listB = deepcopy(listA)
Try running the following code through this tool http://people.csail.mit.edu/pgbovine/python/tutor.html
o1 = [1, 2, 3]
o2 = [4, 5, 6]
l1 = [o1, o2]
l2 = list(l1)
l2[0].reverse()
print l2
print l1
Specifically look at what happens when l2[0].reverse() is called.
You'll see that when you call list() to create a copy of the list, the lists still reference the same objects.

Categories

Resources