generating tuple vs list [duplicate] - python

This question already has answers here:
Why is there no tuple comprehension in Python?
(13 answers)
Closed 1 year ago.
When generating list we do not use the builtin "list" to specify that it is a list just the "[]" would do it.
But when the same style/pattern in used for tuple it does not.
l = [x for x in range(8)]
print(l)
y= ((x for x in range(8)))
print(y)
Output:
[0, 1, 2, 3, 4, 5, 6, 7]
<generator object <genexpr> at 0x000001D1DB7696D0>
Process finished with exit code 0
When "tuple" is specified it displays it right.
Question is:- In the code "list" is not explicitly mentioned but "tuple". Could you tell me why?
l = [x for x in range(8)]
print(l)
y= tuple((x for x in range(8)))
print(y)
Output:
[0, 1, 2, 3, 4, 5, 6, 7]
(0, 1, 2, 3, 4, 5, 6, 7)
Process finished with exit code 0

Using () is a generator expression:
>>> ((x for x in range(8)))
<generator object <genexpr> at 0x0000013FE1AD6040>
>>>
As mentioned in the documentation:
Generator iterators are created by the yield keyword. The real difference between them and ordinary functions is that yield unlike return is both exit and entry point for the function’s body. That means, after each yield call not only the generator returns something but also remembers its state. Calling the next() method brings control back to the generator starting after the last executed yield statement. Each yield statement is executed only once, in the order it appears in the code. After all the yield statements have been executed iteration ends.
A generator in a class would be something like:
class Generator:
def __init__(self, lst):
self.lst = lst
def __iter__(self):
it = iter(self.lst)
yield from it
def __next__(self):
it = iter(self.lst)
return next(it)
Usage:
>>> x = Generator(i for i in range(5))
>>> next(x)
0
>>> next(x)
1
>>> next(x)
2
>>> for i in x:
print(i)
3
4
>>>

Related

Function in Python list comprehension, don't eval twice

I'm composing a Python list from an input list run through a transforming function. I would like to include only those items in the output list for which the result isn't None. This works:
def transform(n):
# expensive irl, so don't execute twice
return None if n == 2 else n**2
a = [1, 2, 3]
lst = []
for n in a:
t = transform(n)
if t is not None:
lst.append(t)
print(lst)
[1, 9]
I have a hunch that this can be simplified with a comprehension. However, the straighforward solution
def transform(n):
return None if n == 2 else n**2
a = [1, 2, 3]
lst = [transform(n) for n in a if transform(n) is not None]
print(lst)
is no good since transform() is applied twice to each entry. Any way around this?
Use the := operator for python >=3.8.
lst = [t for n in a if (t:= transform(n)) is not None]
If not able/don't want to use walrus operator, one can use #functools.lru_cache to cache the result from calling the function and avoid calling it twice
import functools
eggs = [2, 4, 5, 3, 2]
#functools.lru_cache
def spam(foo):
print(foo) # to demonstrate each call
return None if foo % 2 else foo
print([spam(n) for n in eggs if spam(n) is not None])
output
2
4
5
3
[2, 4, 2]
Compared with walrus operator (currently accepted answer) this will be the better option if there are duplicate values in the input list, i.e. walrus operator will always run the function once per element in the input list. Note, you may combine finctools.lru_cache with walrus operator, e.g. for readability.
eggs = [2, 4, 5, 3, 2]
def spam(foo):
print(foo) # to demonstrate each call
return None if foo % 2 else foo
print([bar for n in eggs if (bar:=spam(n)) is not None])
output
2
4
5
3
2
[2, 4, 2]

Nested list comprehension with generators

I have some strange behavior on python 3.7 with a nested list comprehension that involves a generator.
This works:
i = range(20)
n = [1, 2, 3]
result = [min(x + y for x in i) for y in n]
It does not work if i is a generator:
i = (p for p in range(20))
n = [1, 2, 3]
result = [min(x + y for x in i) for y in n]
This raises a ValueError: min() arg is an empty sequence
Now even if the generator i is wrapped with list it still creates the same error:
i = (p for p in range(20))
n = [1, 2, 3]
result = [min(x + y for x in list(i)) for y in n]
Is this a python bug or is it expected behavior? If it is expected behavior, can you explain why this does not work?
In i = range(20) the range(20) is a promise to generate a generator.
While i = (p for p in range(20)) is already a generator.
Now write your list expression as:
for y in [1, 2, 3]:
print(min(x + y for x in i))
## 1
## ...
## ValueError: min() arg is an empty sequence
You get a 1 printed, but (the generator is exhausted in the first call)
and then you get in the next round a
ValueError: min() arg is an empty sequence
because the generator i was already consumed in the first for-loop call for y as 1.
While if i is defined as range(20),
everytime the for x in i is called, the generator is re-created again and again.
You can imitate what range(20) is doing by:
def gen():
return (p for p in range(20))
for y in [1, 2, 3]:
print(min(x + y for x in gen()))
# range() like gen() is a promise to generate the generator
## 1
## 2
## 3
Now the generator is created everytime anew.
But in fact, range is even cooler, if you do:
i = range(20)
for y in [1, 2, 3]:
print(min(x + y for x in i))
## 1
## 2
## 3
The i inside the innerst generator is not a function call.
But despite of that it creates - when evaluted - a new generator -
at least when used as an iterable within a for loop.
This is actually implemented in Python using a class and by defining the __iter__() method. Which defines the behaviour in interators - here especiall a lazy behavior.
To imitate this behavior, we can generate a lazy generator (lazy_gen).
class lazy_gen:
def __init__(self):
pass
def __iter__(self): # everytime when used as an iterator
return self.gen() # recreate the generator # real lazy behavior
def gen(self):
return (p for p in range(20))
Which we can use like:
i = lazy_gen()
for y in [1, 2, 3]:
print(min(x + y for x in i))
## 1
## 2
## 3
So this reflects even better the range() behavior.
Other languages (functional languages) like Lisp family languages (common-lisp, Racket, Scheme, Clojure), R, or Haskell
have a better control over evaluation - thus over lazy evaluation and promises. But in Python, for such implementations and fine grained control, one has to take resort in OOP.
My range function and class
Finally, I figured out how the range function must have been realized roughly.
(For fun, though I could have looked it up in the source code of Python I know - but sometimes reasoning is fun.)
class Myrange:
def __init__(self, start, end, step):
self.start = start
self.end = end
self.step = step
def __iter__(self):
return self.generate_range()
def generate_range(self):
x = self.start - self.step
while x + self.step < self.end:
x = x + self.step
yield x
def __repr__(self):
return "myrange({}, {})".format(self.start, self.end)
def myrange(start=None, end=None, step=1):
if start is None and end is None:
raise "Please provide at least one number for the range-limits."
elif start is not None and end is None:
_start = 0
_end = start
elif start is not None and end is not None:
_start = start
_end = end
else:
_start = 0
_end = end
_step = step
return Myrange(_start, _end, _step)
One can use it exactly like the range function.
i = myrange(20)
n = [1, 2, 3]
result = [min(x + y for x in i) for y in n]
result
## [1, 2, 3]
i
## myrange(0, 20) # representation of a Myrange object.
myrange(20)
## myrange(0, 20)
list(myrange(3, 10))
## [3, 4, 5, 6, 7, 8, 9]
list(myrange(0, 10))
## [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list(myrange(10))
## [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list(myrange(0, 10, 2))
## [0, 2, 4, 6, 8]
list(myrange(3, 10, 2))
## [3, 5, 7, 9]
In both of your last examples, you try to iterate on the generator again after it got exhausted.
In your last example, list(i) is evaluated again for each value of y, so i will be exhausted after the first run.
You have to make a list of the values it yields once before, as in:
i = (p for p in range(20))
n = [1, 2, 3]
list_i = list(i)
result = [min(x + y for x in list_i) for y in n]
The generator is emptied after the first for loop for both for x in i or for x in list(i), instead you need to convert the generator to a list, (which essentially iterates over the generator and empties it) beforehand and use that list
Note that this essentially defeats the purpose of a generator, since now this becomes the same as the first approach
In [14]: list(range(20)) == list(p for p in range(20))
Out[14]: True
Hence the updated code will be
#Create generator and convert to list
i = list(p for p in range(20))
n = [1, 2, 3]
#Use that list in the list comprehension
result = [min(x + y for x in i) for y in n]
print(result)
The output will be
[1, 2, 3]
Hence the better approach hence is to stick with the first approach itself, or you can have the generator inline, which, again is the same as the first approach with range
n = [1, 2, 3]
result = [min(x + y for x in (p for p in range(20))) for y in n]
print(result)
#[1, 2, 3]

Strange behavior of generator from function

While playing with generators, I found interesting thing. When I defined a function with yield keyword, received a generator from it, I also deleted the variable with sequence, which was fed to function. And *POOF!* - generator become empty. Here's my steps:
>>> def foo(list_):
... for i in list_:
... yield i*2
...
>>> val = [1, 2, 3, 4]
>>> gen = foo(val)
>>> print(tuple(gen))
(2, 4, 6, 8)
>>> del val
>>> print(tuple(gen))
()
>>>
Shouldn't it be immutable? Or, if it actually works as object which feeds all values of it's variable to it's function, giving the output, why it didn't throws an exception due to absence of linked sequence? Actually, that example can be explained as if I iterate over a empty sequence, which results that block for _ in []: wouldn't ever start. But, I cant explain why this does not throw an exception:
>>> def foo(list_):
... for i in list_:
... yield i*2
...
>>> val = [1, 2, 3, 4]
>>> gen = foo(val)
>>> print(tuple(gen))
(2, 4, 6, 8)
>>> del foo
>>> print(tuple(gen))
()
>>>
Are generators act here similar to dict.get() function? I don't understand this.
This has nothing to do with objects being deleted. Instead, you have exhausted the generator; generators can only be iterated over once. Create a new generator from the generator function if you need to iterate a second time. You see the same behaviour without deleting references:
>>> def foo(list_):
... for i in list_:
... yield i*2
...
>>> val = [1, 2, 3, 4]
>>> gen = foo(val)
>>> list(gen)
[2, 4, 6, 8]
>>> list(gen) # gen is now exhausted, so no more elements are produced
[]
>>> gen = foo(val) # new generator
>>> list(gen)
[2, 4, 6, 8]
Note that del foo or del val only deletes the reference to an object. It wouldn't delete the function or list object altogether if there are other references to it, like from an existing generator created from the function. So gen = foo(val); del foo, val won't break the gen object, it can still produce values without either the foo or val references existing, because gen itself still references what it needs to complete:
>>> gen = foo(val)
>>> del foo, val
>>> list(gen) # gen still references the list and code objects
[2, 4, 6, 8]

Flattening nested generator expressions

I'm trying to flatten a nested generator of generators but I'm getting an unexpected result:
>>> g = ((3*i + j for j in range(3)) for i in range(3))
>>> list(itertools.chain(*g))
[6, 7, 8, 6, 7, 8, 6, 7, 8]
I expected the result to look like this:
[0, 1, 2, 3, 4, 5, 6, 7, 8]
I think I'm getting the unexpected result because the inner generators are not being evaluated until the outer generator has already been iterated over, setting i to 2. I can hack together a solution by forcing evaluation of the inner generators by using a list comprehension instead of a generator expression:
>>> g = ([3*i + j for j in range(3)] for i in range(3))
>>> list(itertools.chain(*g))
[0, 1, 2, 3, 4, 5, 6, 7, 8]
Ideally, I would like a solution that's completely lazy and doesn't force evaluation of the inner nested elements until they're used.
Is there a way to flatten nested generator expressions of arbitrary depth (maybe using something other than itertools.chain)?
Edit:
No, my question is not a duplicate of Variable Scope In Generators In Classes. I honestly can't tell how these two questions are related at all. Maybe the moderator could explain why he thinks this is a duplicate.
Also, both answers to my question are correct in that they can be used to write a function that flattens nested generators correctly.
def flattened1(iterable):
iter1, iter2 = itertools.tee(iterable)
if isinstance(next(iter1), collections.Iterable):
return flattened1(x for y in iter2 for x in y)
else:
return iter2
def flattened2(iterable):
iter1, iter2 = itertools.tee(iterable)
if isinstance(next(iter1), collections.Iterable):
return flattened2(itertools.chain.from_iterable(iter2))
else:
return iter2
As far as I can tell with timeit, they both perform identically.
>>> timeit(test1, setup1, number=1000000)
18.173431718023494
>>> timeit(test2, setup2, number=1000000)
17.854709611972794
I'm not sure which one is better from a style standpoint either, since x for y in iter2 for x in y is a bit of a brain twister, but arguably more elegant than itertools.chain.from_iterable(iter2). Input is appreciated.
Regrettably, I was only able to mark one of the two equally good answers correct.
Instead of using chain(*g), you can use chain.from_iterable:
>>> g = ((3*i + j for j in range(3)) for i in range(3))
>>> list(itertools.chain(*g))
[6, 7, 8, 6, 7, 8, 6, 7, 8]
>>> g = ((3*i + j for j in range(3)) for i in range(3))
>>> list(itertools.chain.from_iterable(g))
[0, 1, 2, 3, 4, 5, 6, 7, 8]
How about this:
[x for y in g for x in y]
Which yields:
[0, 1, 2, 3, 4, 5, 6, 7, 8]
Guess you already have your answer, but here's another perspective.
The problem is that when each inner generator is created, the value-generating expression is closed over the outer variable i so even when the first inner generator starts generating values, it's using the "current" value of i. This will have value i=2 if the outer generator has been fully consumed (and that's exactly the case right after the argument in the chain(*g) call is evaluated, before chain is actually called).
The following devious trick will work around the problem:
g = ((3*i1 + j for i1 in [i] for j in range(3)) for i in range(3))
Note that these inner generators aren't closed over i because the for clauses are evaluated at generator creation time so the singleton list [i] is evaluated and its value "frozen" in the face of further changes to the value of i.
This approach has the advantage over the from_iterable answer that it's a little more general if you want to use it outside a chain.from_iterable call -- it will always produce the "correct" inner generators, whether the outer generator is partially or fully consumed before the inner generators are used. For example, in the following code:
g = ((3*i1 + j for i1 in [i] for j in range(3)) for i in range(3))
g1 = next(g)
g2 = next(g)
g3 = next(g)
you can insert the lines:
list(g1)
list(g2)
list(g3)
in any order at any point after the respective inner generator has been defined, and you'll get the correct results.

Difference between list() and dict() with generators [duplicate]

This question already has answers here:
Generator expressions vs. list comprehensions
(13 answers)
Closed 7 years ago.
So what's the explanation behind the difference between list() and dict() in the following example:
glist = (x for x in (1, 2, 3))
print(list(glist))
print(list(glist))
gdict = {x:y for x,y in ((1,11), (2,22), (3,33))}
print(dict(gdict))
print(dict(gdict))
>>>
[1, 2, 3]
[]
{1: 11, 2: 22, 3: 33}
{1: 11, 2: 22, 3: 33}
The difference is that only the first expression glist is a generator, the second one gdict is a dict-comprehension. The two would only be equivalent, if you'd change the first one for [x for x in (1, 2, 3)].
A comprehension is evaluated immediately.
These are completely different things. The first expression is a generator: after the first iteration, it is exhausted, so further iterations are empty.
The second is a dict comprehension: like a list comprehension, it returns a new object each time, in this case a dict. So each iteration is over a new dict.
An example would be better to understand this.
Calling next method of generator to yield each element.
>>> a = (i for i in range(4))
>>> a.next()
0
>>> a.next()
1
>>> a.next()
2
>>> a.next()
3
>>> a.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>>
>>> list(a)
[]
Now calling list function on our generator object.
>>> a = (i for i in range(4))
>>> list(a)
[0, 1, 2, 3]
>>> list(a)
[]
Now calling list on our list comprehension.
>>> a = [i for i in range(4)]
>>> list(a)
[0, 1, 2, 3]
>>> list(a)
[0, 1, 2, 3]
So list comprehension and dict comprehension are similar which results in actual data not like generator which yields element.

Categories

Resources