How to understand scope within generators? - python

I seem to misunderstand variable scope within generators. Why do the following two results differ? (Note the second use of tuple when producing the second result.)
def f(x): return x
result1 = tuple(itertools.chain(*((f(val) for _ in range(2)) for val in (1,2,3))))
result2 = tuple(itertools.chain(*(tuple(f(val) for _ in range(2)) for val in (1,2,3))))
print(result1==result2) # False; why?

Fundamentally, scope is working like it always works. Generators just create a local, enclosing scope, just like a function. Essentially, you are creating a closure over val, and in Python, closures are lexically-scoped and late-binding, i.e., their value is evaluated at the point of executing not definition.
The difference between the two is when the outer generator get's iterated over versus the inner generator. In your first example, the outer generator is iterated completely before any of the inner generators are, in the second example, tuple forces them to be evaluated in-tandem.
The problem is that when you use * argument splatting, it immediately evaluates your generator (the outer one), however, the inner generator isn't evaluated yet, but it is closed over val, but val = 3 at the end of the first generator.
But, in your second example,
(tuple(f(val) for _ in range(2)) for val in (1,2,3)))
The inner call to tuple forces f to be called when val is 1, 2, and 3, and thus, f captures those values.
So, consider the following nested generator, and two different ways of iterating over them:
>>> def gen():
... for i in range(3):
... yield (i for _ in range(2))
...
>>> data = list(gen()) # essentially what you are doing with the splatting
>>> for item in data:
... print(list(item))
...
[2, 2]
[2, 2]
[2, 2]
>>> for item in gen():
... print(list(item))
...
[0, 0]
[1, 1]
[2, 2]
>>>
And finally, this should also be informative:
>>> gs = []
>>> for item in gen():
... gs.append(item)
...
>>> gs
[<generator object gen.<locals>.<genexpr> at 0x1041ceba0>, <generator object gen.<locals>.<genexpr> at 0x1041cecf0>, <generator object gen.<locals>.<genexpr> at 0x1041fc200>]
>>> [list(g) for g in gs]
[[2, 2], [2, 2], [2, 2]]
Again, you have to think of what the closure value will be when it actually is evaluated, in the above case, since I've already iterated over the outer generator, so i is 2, and simply appended the inner generators to another list, and then I evaluate the inner generators, they will see the value of i as 2, because that is what it is.
To reiterate, this occurs because * splatting force the generator to be iterated over. use chain.from_iterable instead and you'll get True for your result1 == result2.

Related

Python List Mutation Doesn't Happen When Variable Reassigned. Why?

What strange magic is this?
def rotate_list(lst, n):
n = n % len(lst)
lst = lst[-n:] + lst[:-n]
def rotate_list_2(lst):
lst[0], lst[1], lst[2], lst[3] = lst[3], lst[0], lst[1], lst[2]
s1 = [1, 2, 5, 4]
rotate_list(s1, 1)
print(s1)
s1 = [1, 2, 5, 4]
rotate_list_2(s1)
print(s1)
Output:
[1, 2, 5, 4]
[4, 1, 2, 5]
It appears that although lists are generally mutable within functions, if a list with the same name is created, then the original list is unaffected by changes to the new list. Could someone please explain what is happening here please, in terms of scope and references?
How would I rotate the original list without having to manually update each value as in rotate_list_2()? Or would this kind of thing generally be done by working with new lists returned from a function?
Assigning to list in function doesn't change the original reference.
The assignment just references the local parameter lst on the new value.
The original list referenced outside ('before') the function remains intact.
Insead assign to it's elements with this syntax:
def rotate_list(lst, n):
n = n % len(lst)
lst[:] = lst[-n:] + lst[:-n]
s1 = [1, 2, 5, 4]
rotate_list(s1, 1)
# And it works like magic! :)
# [4, 1, 2, 5]
print(s1)
If you reassign the argument of a function, the value does not change outside of the function scope.
def test0(a):
a = 10
print(a)
x = 4
test0(x)
print(x)
This would result in
10
4
The reason why assigning values of an array works is that you're not assigning a new value to the argument itself. You're instead accessing the memory that the array reads from, and you're changing it. Thus, those changes will happen even for outer scopes.
After change in the function you can return the list and capture it in function call like s=function(s)

mysterious behaviour of python built-in method filter in for loop

Consider the below fact:
a = list(range(10))
res = list(a)
for i in a:
if i in {3, 5}:
print('>>>', i)
res = filter(lambda x: x != i, res)
print(list(res))
>>> 3
>>> 5
[0, 1, 2, 3, 4, 5, 6, 7, 8]
So neither 3 nor 5 was removed, but 9 is gone...
If i force convert the filter object to list, then it work as expected:
a = list(range(10))
res = list(a)
for i in a:
if i in {3, 5}:
print('>>>', i)
# Here i force to convert filter object to list then it will work as expected.
res = list(filter(lambda x: x != i, res))
print(list(res))
>>> 3
>>> 5
[0, 1, 2, 4, 6, 7, 8, 9]
I can feel this is due to that the filter object is a generator, but cannot exactly interpreter how the generator cause this consistent weird behaviour, please help to elaborate the underlying rationalities.
The behaviour arises from a combination of two facts:
The lambda function contains the variable i taken from the surrounding scope, which is only evaluated at execution time. Consider this example:
>>> func = lambda x: x != i # i does not even need to exist yet
>>> i = 3
>>> func(3) # now i will be used
False
Because filter returns a generator, the function is evaluated lazily, when you actually iterate over it, rather than when filter is called.
The combined effect of these, in the first example, is that by the time that you iterate over the filter object, i has the value of 9, and this value is used in the lambda function.
The desired behaviour can be obtained by removing either (or both) of the two combined factors mentioned above:
In the lambda, force early binding by creating a closure, where you use the value of i as the default value of a parameter (say j), so in place of lambda x: x != i, you would use:
lambda x, j=i: x != j
The expression for the default value (i.e. i) is evaluated when the lambda is defined, and by calling the lambda with only one argument (x) this ensures that you do not override this default at execution time.
or:
Force early execution of all iterations of the generator by converting to list immediately (as you have observed).

Strange behavior of generator from function

While playing with generators, I found interesting thing. When I defined a function with yield keyword, received a generator from it, I also deleted the variable with sequence, which was fed to function. And *POOF!* - generator become empty. Here's my steps:
>>> def foo(list_):
... for i in list_:
... yield i*2
...
>>> val = [1, 2, 3, 4]
>>> gen = foo(val)
>>> print(tuple(gen))
(2, 4, 6, 8)
>>> del val
>>> print(tuple(gen))
()
>>>
Shouldn't it be immutable? Or, if it actually works as object which feeds all values of it's variable to it's function, giving the output, why it didn't throws an exception due to absence of linked sequence? Actually, that example can be explained as if I iterate over a empty sequence, which results that block for _ in []: wouldn't ever start. But, I cant explain why this does not throw an exception:
>>> def foo(list_):
... for i in list_:
... yield i*2
...
>>> val = [1, 2, 3, 4]
>>> gen = foo(val)
>>> print(tuple(gen))
(2, 4, 6, 8)
>>> del foo
>>> print(tuple(gen))
()
>>>
Are generators act here similar to dict.get() function? I don't understand this.
This has nothing to do with objects being deleted. Instead, you have exhausted the generator; generators can only be iterated over once. Create a new generator from the generator function if you need to iterate a second time. You see the same behaviour without deleting references:
>>> def foo(list_):
... for i in list_:
... yield i*2
...
>>> val = [1, 2, 3, 4]
>>> gen = foo(val)
>>> list(gen)
[2, 4, 6, 8]
>>> list(gen) # gen is now exhausted, so no more elements are produced
[]
>>> gen = foo(val) # new generator
>>> list(gen)
[2, 4, 6, 8]
Note that del foo or del val only deletes the reference to an object. It wouldn't delete the function or list object altogether if there are other references to it, like from an existing generator created from the function. So gen = foo(val); del foo, val won't break the gen object, it can still produce values without either the foo or val references existing, because gen itself still references what it needs to complete:
>>> gen = foo(val)
>>> del foo, val
>>> list(gen) # gen still references the list and code objects
[2, 4, 6, 8]

Function which removes the first item in a list (Python)

I am trying to write a function which removes the first item in a Python list. This is what I've tried. Why doesn't remove_first_wrong change l when I call the function on it? And why does the list slicing approach work when I do it in the main function?
def remove_first_wrong(lst):
lst = lst[1:]
def remove_first_right(lst):
lst.pop(0)
if __name__ == '__main__':
l = [1, 2, 3, 4, 5]
remove_first_wrong(l)
print(l)
l_2 = [1, 2, 3, 4, 5]
remove_first_right(l_2)
print(l_2)
# Why does this work and remove_first_wrong doesn't?
l_3 = [1, 2, 3, 4, 5]
l_3 = l_3[1:]
print(l_3)
Slicing a list returns a new list object, which is a copy of the original list indices you indicated in the slice. You then rebound lst (a local name in the function) to reference that new list instead. The old list is never altered in that process.
list.pop() on the other hand, operates on the list object itself. It doesn't matter what reference you used to reach the list.
You'd see the same thing without functions:
>>> a = [1, 2]
>>> b = a[:] # slice with all the elements, produces a *copy*
>>> b
[1, 2]
>>> a.pop() # remove an element from a won't change b
2
>>> b
[1, 2]
>>> a
[1]
Using [:] is one of two ways of making a shallow copy of a list, see How to clone or copy a list?
You may want to read or watch Ned Batchelder's Names and Values presestation, to further help understand how Python names and objects work.
Inside the function remove_first_wrong the = sign reassigns the name lst to the object on the right. Which is a brand new object, created by slicing operation lst[1:]. Thus, the object lst assigned to is local to that function (and it actually will disappear on return).
That is what Martijn means by "You then rebound lst (a local name in the function) to reference that new list instead."
On contrary, lst.pop(0) is a call to the given object -- it operates on the object.
For example, this will work right too:
def remove_first_right2(lst):
x = lst # x is assigned to the same object as lst
x.pop(0) # pop the item from the object
Alternately, you can use del keyword:
def remove_first_element(lst):
del lst[0]
return lst

Duplicate list items using list comprehension

I would like to duplicate the items of a list into a new list, for example
a=[1,2]
b=[[i,i] for i in a]
gives [[1, 1], [2, 2]], whereas I would like to have [1, 1, 2, 2].
I also found that I could use:
b=[i for i in a for j in a]
but it seemed like overkill to use two for loops. Is it possible to do this using a single for loop?
You want itertools.chain.from_iterable(), which takes an iterable of iterables and returns a single iterable with all the elements of the sub-iterables (flattening by one level):
b = itertools.chain.from_iterable((i, i) for i in a)
Combined with a generator expression, you get the result you want. Obviously, if you need a list, just call list() on the iterator, but in most cases that isn't needed (and is less efficient).
If, as Ashwini suggests, you want each item len(a) times, it's simple to do that as well:
duplicates = len(a)
b = itertools.chain.from_iterable([i] * duplicates for i in a)
Note that any of these solutions do not copy i, they give you multiple references to the same element. Most of the time, that should be fine.
Your two-loop code does not actually do what you want, because the inner loop is evaluated for every step of the outer loop. Here is an easy solution:
b = [j for i in a for j in (i, i)]
You could use xrange and using a generator expression or a list comprehension
b = (x for x in a for _ in xrange(2))
b = [x for x in a for _ in xrange(2)]
if you do not mind the order:
>>> a = [1,2]
>>> a * 2
[1, 2, 1, 2]

Categories

Resources