Accessing the List Being Generated by List Comprehension [duplicate] - python

This question already has answers here:
Recursive list comprehension in Python?
(7 answers)
Closed 9 years ago.
So here a trivial example that is probably better executed some other way.
Here is the regular for-loop version:
lst1 = ['abc', 'abc', 'cde', 'cde']
lst2 = []
for i in lst1:
if i not in lst2:
lst2.append(i)
And the non-working list comprehension approximation:
lst2 = [i for i in lst1 if i not in lst2]
# NameError: name 'lst2' is not defined
So the question: is it possible to access the list being produced by a list comprehension as is it is being made?

No.
But if order is important, you want this answer from another question:
>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(['abc', 'abc', 'cde', 'cde']))
['abc', 'cde']

TL;DR: There's no easy way to do a recursive list comprehension.
Why? It's because when the interpreter reachs this line, it'll first evaluate the right side of the operation (the list comprehension) and try to construct the list. After the construction it'll affect the list created to lst2. BUT, when you're trying to construct the list, you're calling lst2 which isn't defined yet.
You can look at the bytecode generated:
>>> def test(lst1):
... lst2 = [i for i in lst1 if i not in lst2]
...
>>> dis.dis(test)
2 0 BUILD_LIST 0
3 LOAD_FAST 0 (lst1)
6 GET_ITER
>> 7 FOR_ITER 24 (to 34) # the list comprehension is converted into a for loop
10 STORE_FAST 1 (i)
13 LOAD_FAST 1 (i)
16 LOAD_FAST 2 (lst2) # try to load lst2, which doesn't exist yet
19 COMPARE_OP 7 (not in)
22 POP_JUMP_IF_FALSE 7
25 LOAD_FAST 1 (i)
28 LIST_APPEND 2
31 JUMP_ABSOLUTE 7
>> 34 STORE_FAST 2 (lst2)
37 LOAD_CONST 0 (None)
40 RETURN_VALUE
Solution: What you want to do is to define a set:
>>> lst1 = ['abc', 'abc', 'cde', 'cde']
>>> set(lst1)
set(['cde', 'abc'])
(I hope you doesn't matter about the order of elements :-) ) If the order matters:
>>> tmp = set() # create a set of already added elements
>>> [x for x in lst1 if x not in tmp and not tmp.add(x)]
['abc', 'cde']

No, that is not possible. But you can do something like this:
>>> seen = set()
>>> [x for x in lst1 if x not in seen and not seen.add(x)]
['abc', 'cde']
If order doesn't matter, then simply use set(lst1).

No, inside the list comprehension lst2 is [] until it returns, try this:
print id(lst2)
lst2 = [ (i,id(lst2)) for i in lst1 if i not in lst2]
print lst2
print id(lst2)

Related

How to get all elements of an iterator using slice() built-in function in python without using the iterator's length?

If I have the following list: my_list = [1, 2, 3]. How can I take all the elements using slice() built-in method without knowing the length of my_list?
my_list[slice(-1)] gives me [1, 2] and my_list[slice(':')] gives me a TypeError.
Is there something similar to my_list[:] that I can use with slice so that I can define a variable before the list is created?
You can pass None
>>> my_list = [1, 2, 3]
>>> id(my_list)
1827682884416
>>> copy = my_list[slice(None)]
>>> id(copy)
1827682861888
>>> copy
[1, 2, 3]
>>> my_list is copy
False
I believe that my_list[:] is just a syntactic sugar. You can confirm this by dis module.
>>> import dis
>>> dis.dis("my_list[:]")
1 0 LOAD_NAME 0 (my_list)
2 LOAD_CONST 0 (None)
4 LOAD_CONST 0 (None)
6 BUILD_SLICE 2
8 BINARY_SUBSCR
10 RETURN_VALUE
As you can see [:] syntax gets compile down to None and is used to build the slice later.

Yield from in listcomp or genexpr difference [duplicate]

The following behaviour seems rather counterintuitive to me (Python 3.4):
>>> [(yield i) for i in range(3)]
<generator object <listcomp> at 0x0245C148>
>>> list([(yield i) for i in range(3)])
[0, 1, 2]
>>> list((yield i) for i in range(3))
[0, None, 1, None, 2, None]
The intermediate values of the last line are actually not always None, they are whatever we send into the generator, equivalent (I guess) to the following generator:
def f():
for i in range(3):
yield (yield i)
It strikes me as funny that those three lines work at all. The Reference says that yield is only allowed in a function definition (though I may be reading it wrong and/or it may simply have been copied from the older version). The first two lines produce a SyntaxError in Python 2.7, but the third line doesn't.
Also, it seems odd
that a list comprehension returns a generator and not a list
and that the generator expression converted to a list and the corresponding list comprehension contain different values.
Could someone provide more information?
Note: this was a bug in the CPython's handling of yield in comprehensions and generator expressions, fixed in Python 3.8, with a deprecation warning in Python 3.7. See the Python bug report and the What's New entries for Python 3.7 and Python 3.8.
Generator expressions, and set and dict comprehensions are compiled to (generator) function objects. In Python 3, list comprehensions get the same treatment; they are all, in essence, a new nested scope.
You can see this if you try to disassemble a generator expression:
>>> dis.dis(compile("(i for i in range(3))", '', 'exec'))
1 0 LOAD_CONST 0 (<code object <genexpr> at 0x10f7530c0, file "", line 1>)
3 LOAD_CONST 1 ('<genexpr>')
6 MAKE_FUNCTION 0
9 LOAD_NAME 0 (range)
12 LOAD_CONST 2 (3)
15 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
18 GET_ITER
19 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
22 POP_TOP
23 LOAD_CONST 3 (None)
26 RETURN_VALUE
>>> dis.dis(compile("(i for i in range(3))", '', 'exec').co_consts[0])
1 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 11 (to 17)
6 STORE_FAST 1 (i)
9 LOAD_FAST 1 (i)
12 YIELD_VALUE
13 POP_TOP
14 JUMP_ABSOLUTE 3
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
The above shows that a generator expression is compiled to a code object, loaded as a function (MAKE_FUNCTION creates the function object from the code object). The .co_consts[0] reference lets us see the code object generated for the expression, and it uses YIELD_VALUE just like a generator function would.
As such, the yield expression works in that context, as the compiler sees these as functions-in-disguise.
This is a bug; yield has no place in these expressions. The Python grammar before Python 3.7 allows it (which is why the code is compilable), but the yield expression specification shows that using yield here should not actually work:
The yield expression is only used when defining a generator function and thus can only be used in the body of a function definition.
This has been confirmed to be a bug in issue 10544. The resolution of the bug is that using yield and yield from will raise a SyntaxError in Python 3.8; in Python 3.7 it raises a DeprecationWarning to ensure code stops using this construct. You'll see the same warning in Python 2.7.15 and up if you use the -3 command line switch enabling Python 3 compatibility warnings.
The 3.7.0b1 warning looks like this; turning warnings into errors gives you a SyntaxError exception, like you would in 3.8:
>>> [(yield i) for i in range(3)]
<stdin>:1: DeprecationWarning: 'yield' inside list comprehension
<generator object <listcomp> at 0x1092ec7c8>
>>> import warnings
>>> warnings.simplefilter('error')
>>> [(yield i) for i in range(3)]
File "<stdin>", line 1
SyntaxError: 'yield' inside list comprehension
The differences between how yield in a list comprehension and yield in a generator expression operate stem from the differences in how these two expressions are implemented. In Python 3 a list comprehension uses LIST_APPEND calls to add the top of the stack to the list being built, while a generator expression instead yields that value. Adding in (yield <expr>) just adds another YIELD_VALUE opcode to either:
>>> dis.dis(compile("[(yield i) for i in range(3)]", '', 'exec').co_consts[0])
1 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 13 (to 22)
9 STORE_FAST 1 (i)
12 LOAD_FAST 1 (i)
15 YIELD_VALUE
16 LIST_APPEND 2
19 JUMP_ABSOLUTE 6
>> 22 RETURN_VALUE
>>> dis.dis(compile("((yield i) for i in range(3))", '', 'exec').co_consts[0])
1 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 12 (to 18)
6 STORE_FAST 1 (i)
9 LOAD_FAST 1 (i)
12 YIELD_VALUE
13 YIELD_VALUE
14 POP_TOP
15 JUMP_ABSOLUTE 3
>> 18 LOAD_CONST 0 (None)
21 RETURN_VALUE
The YIELD_VALUE opcode at bytecode indexes 15 and 12 respectively is extra, a cuckoo in the nest. So for the list-comprehension-turned-generator you have 1 yield producing the top of the stack each time (replacing the top of the stack with the yield return value), and for the generator expression variant you yield the top of the stack (the integer) and then yield again, but now the stack contains the return value of the yield and you get None that second time.
For the list comprehension then, the intended list object output is still returned, but Python 3 sees this as a generator so the return value is instead attached to the StopIteration exception as the value attribute:
>>> from itertools import islice
>>> listgen = [(yield i) for i in range(3)]
>>> list(islice(listgen, 3)) # avoid exhausting the generator
[0, 1, 2]
>>> try:
... next(listgen)
... except StopIteration as si:
... print(si.value)
...
[None, None, None]
Those None objects are the return values from the yield expressions.
And to reiterate this again; this same issue applies to dictionary and set comprehension in Python 2 and Python 3 as well; in Python 2 the yield return values are still added to the intended dictionary or set object, and the return value is 'yielded' last instead of attached to the StopIteration exception:
>>> list({(yield k): (yield v) for k, v in {'foo': 'bar', 'spam': 'eggs'}.items()})
['bar', 'foo', 'eggs', 'spam', {None: None}]
>>> list({(yield i) for i in range(3)})
[0, 1, 2, set([None])]

Python follow up on yield as a function argument [duplicate]

The following behaviour seems rather counterintuitive to me (Python 3.4):
>>> [(yield i) for i in range(3)]
<generator object <listcomp> at 0x0245C148>
>>> list([(yield i) for i in range(3)])
[0, 1, 2]
>>> list((yield i) for i in range(3))
[0, None, 1, None, 2, None]
The intermediate values of the last line are actually not always None, they are whatever we send into the generator, equivalent (I guess) to the following generator:
def f():
for i in range(3):
yield (yield i)
It strikes me as funny that those three lines work at all. The Reference says that yield is only allowed in a function definition (though I may be reading it wrong and/or it may simply have been copied from the older version). The first two lines produce a SyntaxError in Python 2.7, but the third line doesn't.
Also, it seems odd
that a list comprehension returns a generator and not a list
and that the generator expression converted to a list and the corresponding list comprehension contain different values.
Could someone provide more information?
Note: this was a bug in the CPython's handling of yield in comprehensions and generator expressions, fixed in Python 3.8, with a deprecation warning in Python 3.7. See the Python bug report and the What's New entries for Python 3.7 and Python 3.8.
Generator expressions, and set and dict comprehensions are compiled to (generator) function objects. In Python 3, list comprehensions get the same treatment; they are all, in essence, a new nested scope.
You can see this if you try to disassemble a generator expression:
>>> dis.dis(compile("(i for i in range(3))", '', 'exec'))
1 0 LOAD_CONST 0 (<code object <genexpr> at 0x10f7530c0, file "", line 1>)
3 LOAD_CONST 1 ('<genexpr>')
6 MAKE_FUNCTION 0
9 LOAD_NAME 0 (range)
12 LOAD_CONST 2 (3)
15 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
18 GET_ITER
19 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
22 POP_TOP
23 LOAD_CONST 3 (None)
26 RETURN_VALUE
>>> dis.dis(compile("(i for i in range(3))", '', 'exec').co_consts[0])
1 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 11 (to 17)
6 STORE_FAST 1 (i)
9 LOAD_FAST 1 (i)
12 YIELD_VALUE
13 POP_TOP
14 JUMP_ABSOLUTE 3
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
The above shows that a generator expression is compiled to a code object, loaded as a function (MAKE_FUNCTION creates the function object from the code object). The .co_consts[0] reference lets us see the code object generated for the expression, and it uses YIELD_VALUE just like a generator function would.
As such, the yield expression works in that context, as the compiler sees these as functions-in-disguise.
This is a bug; yield has no place in these expressions. The Python grammar before Python 3.7 allows it (which is why the code is compilable), but the yield expression specification shows that using yield here should not actually work:
The yield expression is only used when defining a generator function and thus can only be used in the body of a function definition.
This has been confirmed to be a bug in issue 10544. The resolution of the bug is that using yield and yield from will raise a SyntaxError in Python 3.8; in Python 3.7 it raises a DeprecationWarning to ensure code stops using this construct. You'll see the same warning in Python 2.7.15 and up if you use the -3 command line switch enabling Python 3 compatibility warnings.
The 3.7.0b1 warning looks like this; turning warnings into errors gives you a SyntaxError exception, like you would in 3.8:
>>> [(yield i) for i in range(3)]
<stdin>:1: DeprecationWarning: 'yield' inside list comprehension
<generator object <listcomp> at 0x1092ec7c8>
>>> import warnings
>>> warnings.simplefilter('error')
>>> [(yield i) for i in range(3)]
File "<stdin>", line 1
SyntaxError: 'yield' inside list comprehension
The differences between how yield in a list comprehension and yield in a generator expression operate stem from the differences in how these two expressions are implemented. In Python 3 a list comprehension uses LIST_APPEND calls to add the top of the stack to the list being built, while a generator expression instead yields that value. Adding in (yield <expr>) just adds another YIELD_VALUE opcode to either:
>>> dis.dis(compile("[(yield i) for i in range(3)]", '', 'exec').co_consts[0])
1 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 13 (to 22)
9 STORE_FAST 1 (i)
12 LOAD_FAST 1 (i)
15 YIELD_VALUE
16 LIST_APPEND 2
19 JUMP_ABSOLUTE 6
>> 22 RETURN_VALUE
>>> dis.dis(compile("((yield i) for i in range(3))", '', 'exec').co_consts[0])
1 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 12 (to 18)
6 STORE_FAST 1 (i)
9 LOAD_FAST 1 (i)
12 YIELD_VALUE
13 YIELD_VALUE
14 POP_TOP
15 JUMP_ABSOLUTE 3
>> 18 LOAD_CONST 0 (None)
21 RETURN_VALUE
The YIELD_VALUE opcode at bytecode indexes 15 and 12 respectively is extra, a cuckoo in the nest. So for the list-comprehension-turned-generator you have 1 yield producing the top of the stack each time (replacing the top of the stack with the yield return value), and for the generator expression variant you yield the top of the stack (the integer) and then yield again, but now the stack contains the return value of the yield and you get None that second time.
For the list comprehension then, the intended list object output is still returned, but Python 3 sees this as a generator so the return value is instead attached to the StopIteration exception as the value attribute:
>>> from itertools import islice
>>> listgen = [(yield i) for i in range(3)]
>>> list(islice(listgen, 3)) # avoid exhausting the generator
[0, 1, 2]
>>> try:
... next(listgen)
... except StopIteration as si:
... print(si.value)
...
[None, None, None]
Those None objects are the return values from the yield expressions.
And to reiterate this again; this same issue applies to dictionary and set comprehension in Python 2 and Python 3 as well; in Python 2 the yield return values are still added to the intended dictionary or set object, and the return value is 'yielded' last instead of attached to the StopIteration exception:
>>> list({(yield k): (yield v) for k, v in {'foo': 'bar', 'spam': 'eggs'}.items()})
['bar', 'foo', 'eggs', 'spam', {None: None}]
>>> list({(yield i) for i in range(3)})
[0, 1, 2, set([None])]

Should I use 'in' or 'or' in an if statement in Python 3.x to check a variable against multiple values?

Suppose I have the following, which is the better, faster, more Pythonic method
and why?
if x == 2 or x == 3 or x == 4:
do following...
or :
if x in (2, 3, 4):
do following...
In Python 3 (3.2 and up), you should use a set:
if x in {2, 3, 4}:
as set membership is a O(1) test, versus a worst-case performance of O(N) for testing with separate or equality tests or using membership in a tuple.
In Python 3, the set literal will be optimised to use a frozenset constant:
>>> import dis
>>> dis.dis(compile('x in {1, 2, 3}', '<file>', 'exec'))
1 0 LOAD_NAME 0 (x)
3 LOAD_CONST 4 (frozenset({1, 2, 3}))
6 COMPARE_OP 6 (in)
9 POP_TOP
10 LOAD_CONST 3 (None)
13 RETURN_VALUE
Note that this optimisation was added to Python 3.2 and in Python 2 or 3.0 or 3.1 you'd be better of using a tuple instead. For a small number of elements, the difference in lookup time is nullified by the set creation for each execution.

Creating Simultaneous Loops in Python

I want to create a loop who has this sense:
for i in xrange(0,10):
for k in xrange(0,10):
z=k+i
print z
where the output should be
0
2
4
6
8
10
12
14
16
18
You can use zip to turn multiple lists (or iterables) into pairwise* tuples:
>>> for a,b in zip(xrange(10), xrange(10)):
... print a+b
...
0
2
4
6
8
10
12
14
16
18
But zip will not scale as well as izip (that sth mentioned) on larger sets. zip's advantage is that it is a built-in and you don't have to import itertools -- and whether that is actually an advantage is subjective.
*Not just pairwise, but n-wise. The tuples' length will be the same as the number of iterables you pass in to zip.
The itertools module contains an izip function that combines iterators in the desired way:
from itertools import izip
for (i, k) in izip(xrange(0,10), xrange(0,10)):
print i+k
You can do this in python - just have to make the tabs right and use the xrange argument for step.
for i in xrange(0, 20, 2);
print i
What about this?
i = range(0,10)
k = range(0,10)
for x in range(0,10):
z=k[x]+i[x]
print z
0
2
4
6
8
10
12
14
16
18
What you want is two arrays and one loop, iterate over each array once, adding the results.

Categories

Resources