I tried mutating a list by swapping a common element between the list and another reference list with the first element. The implementation is as shown below:
>>> L = [1,2,3,4,5,6,7,8,9]
>>> A = [3]
>>> L[0], L[L.index(A[0])] = L[L.index(A[0])], L[0] #want to swap 3 with 1
>>> L
[1,2,3,4,5,6,7,8,9,] #List L was not mutated
The list was not mutated as I anticipated. But when I modify the implementation as shown below, it worked:
>>> L = [1,2,3,4,5,6,7,8,9]
>>> A = [3]
>>> i = L.index(A[0])
>>> L[0], L[i] = L[i], L[0]
>>> L
[3,2,1,4,5,6,7,8,9,] #Now list mutated as desired even though L[i] and L[L.index(A[0])] evaluate to same value.
My question is, why couldn't the first assignment mutate the list? I thought of it but my brain coudnt explain it.
Although in Python, the right-hand side is evaluated first when doing multiple assignments, the left-hand assigment targets, if they have expressions in them, are evaluated one by one when assigning.
If instead, they'd be evaluated as assignment targets first as you appear to expect, this would of course work.
This is documented in the assignment statements section:
An assignment statement evaluates the expression list (remember that this can be a single expression or a comma-separated list, the latter yielding a tuple) and assigns the single resulting object to each of the target lists, from left to right.
and
If the target list is a comma-separated list of targets: The object must be an iterable with the same number of items as there are targets in the target list, and the items are assigned, from left to right, to the corresponding targets.
Emphasis mine. The left-to-right is crucial here. L[0] is assigned to before the L[L.index(3)] is assigned to.
The documentation then describes in detail what happens to a subscription target such as L[0] and L[L.index(3)]:
If the target is a subscription: The primary expression in the reference is evaluated. It should yield either a mutable sequence object (such as a list) or a mapping object (such as a dictionary). Next, the subscript expression is evaluated.
Again, emphasis mine; the subscript expression is evaluated separately, and since the target list is evaluated from left-to-right, that evaluation takes place after the previous assignment to L[0].
You can see this by disassembling the python code:
>>> import dis
>>> def f(L):
... L[0], L[2] = L[2], L[0]
...
>>> def g(L):
... L[0], L[L.index(3)] = L[L.index(3)], L[0]
...
>>> dis.dis(f)
2 0 LOAD_FAST 0 (L) # L[2]
3 LOAD_CONST 1 (2)
6 BINARY_SUBSCR
7 LOAD_FAST 0 (L) # L[0]
10 LOAD_CONST 2 (0)
13 BINARY_SUBSCR
14 ROT_TWO
15 LOAD_FAST 0 (L) # Store in L[0]
18 LOAD_CONST 2 (0)
21 STORE_SUBSCR
22 LOAD_FAST 0 (L) # Store in L[2]
25 LOAD_CONST 1 (2)
28 STORE_SUBSCR
29 LOAD_CONST 0 (None)
32 RETURN_VALUE
>>> dis.dis(g)
2 0 LOAD_FAST 0 (L) # L[L.index(3)]
3 LOAD_FAST 0 (L)
6 LOAD_ATTR 0 (index)
9 LOAD_CONST 1 (3)
12 CALL_FUNCTION 1
15 BINARY_SUBSCR
16 LOAD_FAST 0 (L) # L[0]
19 LOAD_CONST 2 (0)
22 BINARY_SUBSCR
23 ROT_TWO
24 LOAD_FAST 0 (L) # Store in L[0]
27 LOAD_CONST 2 (0)
30 STORE_SUBSCR
31 LOAD_FAST 0 (L) # Store in L[L.index(3)]
34 LOAD_FAST 0 (L)
37 LOAD_ATTR 0 (index)
40 LOAD_CONST 1 (3)
43 CALL_FUNCTION 1
46 STORE_SUBSCR
47 LOAD_CONST 0 (None)
50 RETURN_VALUE
The storing operation first stores L[0] = 3, so the next call to L.index(3) returns 0 and the 1 is thus stored right back in position 0!
The following does work:
L[L.index(3)], L[0] = L[0], L[L.index(3)]
because now the L.index(3) lookup is done first. However, it's best to store the result of an .index() call in a temporary variable as not calling .index() twice is going to be more efficient in any case.
The problem is that the two are not equivalent. The first example is akin to doing:
>>> L = [1,2,3,4,5,6,7,8,9]
>>> A = [3]
>>> i = L.index(A[0])
>>> L[0] = L[i]
>>> i = L.index(A[0])
>>> L[i] = L[0]
This means that you end up swapping, then finding the element you just swapped and swapping back.
The reason you are confused is that you are thinking of the tuple assignment as Python doing both things at the same time - this isn't how the execution is performed, it is performed in an order, which changes the outcome.
It's worth noting that even if it did work, it would be a sub-optimal way of doing this. list.index() isn't a particularly fast operation, so doing it twice for no reason isn't a great idea.
Related
I tried in the snippet below:
a, b = a[b] = {}, 5
print('a={0},b={1}'.format(a,b))
The IDE spits out the follows:
a={5: ({...}, 5)},b=5
I have tried S3DEV's advice and execute:
from dis import dis
dis('a, b = a[b] = {}, 5')
And it gives me the follows:
1 0 BUILD_MAP 0
2 LOAD_CONST 0 (5)
4 BUILD_TUPLE 2
6 DUP_TOP
8 UNPACK_SEQUENCE 2
10 STORE_NAME 0 (a)
12 STORE_NAME 1 (b)
14 LOAD_NAME 0 (a)
16 LOAD_NAME 1 (b)
18 STORE_SUBSCR
20 LOAD_CONST 1 (None)
22 RETURN_VALUE
But I still cannot understand why a[b] = a, 5 happened in the step 18 STORE_SUBSCR. Any further explanation?
This is an assignment statement with multiple target_list:s, for which case the docs say that the statement "assigns the single resulting object to each of the target lists, from left to right." Within each target_list, assignments also proceed left to right.
Thus, the statement is equivalent to
a = {}
b = 5
a[b] = a, 5
The reason that the last assignment is a[b]=a,5 and not a,b={},5 is that the value ({}, 5) is only evaluated once, so it's the same dict that gets used throughout. First, a is set to refer to that dict, then the dict — through a — is modified to refer to itself.
EDIT: Perhaps it is clearer to say that the statement is equivalent to
temp1 = {}
temp2 = 5
a = temp1
b = temp2
a[b] = temp1, temp2
Right before the last step, a and temp1 refer to the same object, which thus becomes self-referring after the last step.
This is not code I want to see in production. :)
This question already has an answer here:
Python Assignment Operator Precedence - (a, b) = a[b] = {}, 5
(1 answer)
Closed 4 years ago.
I found the assignment a = a[1:] = [2] in an article. I tried it in python3 and python2; it all works, but I don't understand how it works. = here is not like in C; C processes = by right to left. How does python process the = operator?
Per the language docs on assignment:
An assignment statement evaluates the expression list (remember that this can be a single expression or a comma-separated list, the latter yielding a tuple) and assigns the single resulting object to each of the target lists, from left to right.
In this case, a = a[1:] = [2] has an expression list [2], and two "target lists", a and a[1:], where a is the left-most "target list".
You can see how this behaves by looking at the disassembly:
>>> import dis
>>> dis.dis('a = a[1:] = [2]')
1 0 LOAD_CONST 0 (2)
2 BUILD_LIST 1
4 DUP_TOP
6 STORE_NAME 0 (a)
8 LOAD_NAME 0 (a)
10 LOAD_CONST 1 (1)
12 LOAD_CONST 2 (None)
14 BUILD_SLICE 2
16 STORE_SUBSCR
18 LOAD_CONST 2 (None)
20 RETURN_VALUE
(The last two lines of the disassembly can be ignored, dis is making a function wrapper to disassemble the string)
The important part to note is that when you do x = y = some_val, some_val is loaded on the stack (in this case by the LOAD_CONST and BUILD_LIST), then the stack entry is duplicated and assigned, from left to right, to the targets given.
So when you do:
a = a[1:] = [2]
it makes two references to a brand new list containing 2, and the first action is a STORE one of these references to a. Next, it stores the second reference to a[1:], but since the slice assignment mutates a itself, it has to load a again, which gets the list just stored. Luckily, list is resilient against self-slice-assignment, or we'd have issues (it would be forever reading the value it just added to add to the end until we ran out of memory and crashed); as is, it behaves as a copy of [2] was assigned to replace any and all elements from index one onwards.
The end result is equivalent to if you'd done:
_ = [2]
a = _
a[1:] = _
but it avoids the use of the _ name.
To be clear, the disassembly annotated:
Make list [2]:
1 0 LOAD_CONST 0 (2)
2 BUILD_LIST 1
Make a copy of the reference to [2]:
4 DUP_TOP
Perform store to a:
6 STORE_NAME 0 (a)
Perform store to a[1:]:
8 LOAD_NAME 0 (a)
10 LOAD_CONST 1 (1)
12 LOAD_CONST 2 (None)
14 BUILD_SLICE 2
16 STORE_SUBSCR
The way I understand such assignments is that this is equivalent to
temp = [2]
a = temp
a[1:] = temp
The resulting value of [2, 2] is consistent with this interpretation.
I'm using symtable to get the symbol tables of a piece of code. Curiously, when using a comprehension (listcomp, setcomp, etc.), there are some extra symbols I didn't define.
Reproduction (using CPython 3.6):
import symtable
root = symtable.symtable('[x for x in y]', '?', 'exec')
# Print symtable of the listcomp
print(root.get_children()[0].get_symbols())
Output:
[<symbol '.0'>, <symbol '_[1]'>, <symbol 'x'>]
Symbol x is expected. But what are .0 and _[1]?
Note that with any other non-comprehension construct I'm getting exactly the identifiers I used in the code. E.g., lambda x: y only results in the symbols [<symbol 'x'>, <symbol 'y'>].
Also, the docs say that symtable.Symbol is...
An entry in a SymbolTable corresponding to an identifier in the source.
...although these identifiers evidently don't appear in the source.
The two names are used to implement list comprehensions as a separate scope, and they have the following meaning:
.0 is an implicit argument, used for the iterable (sourced from y in your case).
_[1] is a temporary name in the symbol table, used for the target list. This list eventually ends up on the stack.*
A list comprehension (as well as dict and set comprehension and generator expressions) is executed in a new scope. To achieve this, Python effectively creates a new anonymous function.
Because it is a function, really, you need to pass in the iterable you are looping over as an argument. This is what .0 is for, it is the first implicit argument (so at index 0). The symbol table you produced explicitly lists .0 as an argument:
>>> root = symtable.symtable('[x for x in y]', '?', 'exec')
>>> type(root.get_children()[0])
<class 'symtable.Function'>
>>> root.get_children()[0].get_parameters()
('.0',)
The first child of your table is a function with one argument named .0.
A list comprehension also needs to build the output list, and that list could be seen as a local too. This is the _[1] temporary variable. It never actually becomes a named local variable in the code object that is produced; this temporary variable is kept on the stack instead.
You can see the code object produced when you use compile():
>>> code_object = compile('[x for x in y]', '?', 'exec')
>>> code_object
<code object <module> at 0x11a4f3ed0, file "?", line 1>
>>> code_object.co_consts[0]
<code object <listcomp> at 0x11a4ea8a0, file "?", line 1>
So there is an outer code object, and in the constants, is another, nested code object. That latter one is the actual code object for the loop. It uses .0 and x as local variables. It also takes 1 argument; the names for arguments are the first co_argcount values in the co_varnames tuple:
>>> code_object.co_consts[0].co_varnames
('.0', 'x')
>>> code_object.co_consts[0].co_argcount
1
So .0 is the argument name here.
The _[1] temporary variable is handled on the stack, see the disassembly:
>>> import dis
>>> dis.dis(code_object.co_consts[0])
1 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 8 (to 14)
6 STORE_FAST 1 (x)
8 LOAD_FAST 1 (x)
10 LIST_APPEND 2
12 JUMP_ABSOLUTE 4
>> 14 RETURN_VALUE
Here we see .0 referenced again. _[1] is the BUILD_LIST opcode pushing a list object onto the stack, then .0 is put on the stack for the FOR_ITER opcode to iterate over (the opcode removes the iterable from .0 from the stack again).
Each iteration result is pushed onto the stack by FOR_ITER, popped again and stored in x with STORE_FAST, then loaded onto the stack again with LOAD_FAST. Finally LIST_APPEND takes the top element from the stack, and adds it to the list referenced by the next element on the stack, so to _[1].
JUMP_ABSOLUTE then brings us back to the top of the loop, where we continue to iterate until the iterable is done. Finally, RETURN_VALUE returns the top of the stack, again _[1], to the caller.
The outer code object does the work of loading the nested code object and calling it as a function:
>>> dis.dis(code_object)
1 0 LOAD_CONST 0 (<code object <listcomp> at 0x11a4ea8a0, file "?", line 1>)
2 LOAD_CONST 1 ('<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_NAME 0 (y)
8 GET_ITER
10 CALL_FUNCTION 1
12 POP_TOP
14 LOAD_CONST 2 (None)
16 RETURN_VALUE
So this makes a function object, with the function named <listcomp> (helpful for tracebacks), loads y, produces an iterator for it (the moral equivalent of iter(y), and calls the function with that iterator as the argument.
If you wanted to translate that to Psuedo-code, it would look like:
def <listcomp>(.0):
_[1] = []
for x in .0:
_[1].append(x)
return _[1]
<listcomp>(iter(y))
The _[1] temporary variable is of course not needed for generator expressions:
>>> symtable.symtable('(x for x in y)', '?', 'exec').get_children()[0].get_symbols()
[<symbol '.0'>, <symbol 'x'>]
Instead of appending to a list, the generator expression function object yields the values:
>>> dis.dis(compile('(x for x in y)', '?', 'exec').co_consts[0])
1 0 LOAD_FAST 0 (.0)
>> 2 FOR_ITER 10 (to 14)
4 STORE_FAST 1 (x)
6 LOAD_FAST 1 (x)
8 YIELD_VALUE
10 POP_TOP
12 JUMP_ABSOLUTE 2
>> 14 LOAD_CONST 0 (None)
16 RETURN_VALUE
Together with the outer bytecode, the generator expression is equivalent to:
def <genexpr>(.0):
for x in .0:
yield x
<genexpr>(iter(y))
* The temporary variable is actually no longer needed; they were used in the initial implementation of comprehensions, but this commit from April 2007 moved the compiler to just using the stack, and this has been the norm for all of the 3.x releases, as well as Python 2.7. It still is easier to think of the generated name as a reference to the stack. Because the variable is no longer needed, I filed issue 32836 to have it removed, and Python 3.8 and onwards will no longer include it in the symbol table.
In Python 2.6, you can still see the actual temporary name in the disassembly:
>>> import dis
>>> dis.dis(compile('[x for x in y]', '?', 'exec'))
1 0 BUILD_LIST 0
3 DUP_TOP
4 STORE_NAME 0 (_[1])
7 LOAD_NAME 1 (y)
10 GET_ITER
>> 11 FOR_ITER 13 (to 27)
14 STORE_NAME 2 (x)
17 LOAD_NAME 0 (_[1])
20 LOAD_NAME 2 (x)
23 LIST_APPEND
24 JUMP_ABSOLUTE 11
>> 27 DELETE_NAME 0 (_[1])
30 POP_TOP
31 LOAD_CONST 0 (None)
34 RETURN_VALUE
Note how the name actually has to be deleted again!
So, the way list-comprehensions are implemented is actually by creating a code-object, it is sort of like creating a one-time use anonymous function, for scoping purposes:
>>> import dis
>>> def f(y): [x for x in y]
...
>>> dis.dis(f)
1 0 LOAD_CONST 1 (<code object <listcomp> at 0x101df9db0, file "<stdin>", line 1>)
3 LOAD_CONST 2 ('f.<locals>.<listcomp>')
6 MAKE_FUNCTION 0
9 LOAD_FAST 0 (y)
12 GET_ITER
13 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
16 POP_TOP
17 LOAD_CONST 0 (None)
20 RETURN_VALUE
>>>
Inspecting the code-object, I can find the .0 symbol:
>>> dis.dis(f.__code__.co_consts[1])
1 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE
Note, the LOAD_FAST in the list-comp code-object seems to be loading the unnamed argument, which would correspond to the GET_ITER
Suppose I have two lists, A = [1,2,3,4] and B = [4,5,6]
I would like a list which includes the elements from both A and B. (I don't care if A itself gets altered).
A couple things I could do, and my understanding of them (please tell me if I am wrong):
A.extend(B) (elements of B get added in to A; A itself is altered)
C = A + B (makes a brand new object C, which contains the contents of A and B in it.)
I wanted to understand which is more efficient, so I was wondering if someone can someone please tell me if my assumptions below are incorrect.
In the case of A.extend(B), I'm assuming python only has to do 3 list add operations (the 3 elements of B, each of which it appends to A). However, in doing A + B, doesn't python have to iterate through both lists A and B, in that case doing 7 list add operations? (i.e., it has to make a new list, go through A and put all the elements in it, and then go through B and put all the elements in it).
Am I misunderstanding how the interpreter handles these things, or what these operations do in python?
Below is the bytecode analysis of both operations. There are no major performance difference between two. The only difference is that the .extend way involves a CALL_FUNCTION, which is slightly more expensive in Python than the BINARY_ADD.
But this should not be a problem unless of are working on huge data operations.
>>> import dis
>>> a = [1,2,3,4]
>>> b = [4,5,6]
>>> def f1(a,b):
... a.extend(b)
>>> def f2(a,b):
... c = a+ b
>>> dis.dis(f1)
2 0 LOAD_FAST 0 (a)
3 LOAD_ATTR 0 (extend)
6 LOAD_FAST 1 (b)
9 CALL_FUNCTION 1
12 POP_TOP
13 LOAD_CONST 0 (None)
16 RETURN_VALUE
>>> dis.dis(f2)
2 0 LOAD_FAST 0 (a)
3 LOAD_FAST 1 (b)
6 BINARY_ADD
7 STORE_FAST 2 (c)
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
Disclaimer: I'm new to programming, but new to Python. This may be a pretty basic question.
I have the following block of code:
for x in range(0, 100):
y = 1 + 1;
Is the calculation of 1 + 1 in the second line executed 100 times?
I have two suspicions why it might not:
1) The compiler sees 1 + 1 as a constant value, and thus compiles this line into y = 2;.
2) The compiler sees that y is only set and never referenced, so it omits this line of code.
Are either/both of these correct, or does it actually get executed each iteration over the loop?
Option 1 is executed; the CPython compiler simplifies mathematical expressions with constants in the peephole optimiser.
Python will not eliminate the loop body however.
You can introspect what Python produces by looking at the bytecode; use the dis module to take a look:
>>> import dis
>>> def f():
... for x in range(100):
... y = 1 + 1
...
>>> dis.dis(f)
2 0 SETUP_LOOP 26 (to 29)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (100)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 GET_ITER
>> 13 FOR_ITER 12 (to 28)
16 STORE_FAST 0 (x)
3 19 LOAD_CONST 3 (2)
22 STORE_FAST 1 (y)
25 JUMP_ABSOLUTE 13
>> 28 POP_BLOCK
>> 29 LOAD_CONST 0 (None)
32 RETURN_VALUE
The bytecode at position 19, LOAD_CONST loads the value 2 to store in y.
You can see the constants associated with the code object in the co_consts attribute of a code object; for functions you can find that object under the __code__ attribute:
>>> f.__code__.co_consts
(None, 100, 1, 2)
None is the default return value for any function, 100 the literal passed to the range() call, 1 the original literal, left in place by the peephole optimiser and 2 is the result of the optimisation.
The work is done in peephole.c, in the fold_binops_on_constants() function:
/* Replace LOAD_CONST c1. LOAD_CONST c2 BINOP
with LOAD_CONST binop(c1,c2)
The consts table must still be in list form so that the
new constant can be appended.
Called with codestr pointing to the first LOAD_CONST.
Abandons the transformation if the folding fails (i.e. 1+'a').
If the new constant is a sequence, only folds when the size
is below a threshold value. That keeps pyc files from
becoming large in the presence of code like: (None,)*1000.
*/
Take into account that Python is a highly dynamic language, such optimisations can only be applied to literals and constants that you cannot later dynamically replace.