Say I have a list called list, which is comprised of boolean values. Also say that I have some (valid) index i which is the index of list where I want to switch the value.
Currently, I have: list[i] = not list[i].
But my question is, doesn't this iterate through list twice? If so is there are way to setup a temp value through aliasing to only iterate through the list once?
I tried the following:
temp = list[i]
temp = not temp
But this has not worked for me, it has only switched the value of temp, and not the value of list[i].
you can look a little ways 'under the hood' using the dis module https://docs.python.org/3/library/dis.html
import dis
boolst = [True, True, True, True, True]
dis.dis('boolst[2] = not boolst[2]')
1 0 LOAD_NAME 0 (boolst)
2 LOAD_CONST 0 (2)
4 BINARY_SUBSCR
6 UNARY_NOT
8 LOAD_NAME 0 (boolst)
10 LOAD_CONST 0 (2)
12 STORE_SUBSCR
14 LOAD_CONST 1 (None)
16 RETURN_VALUE
dis.dis('boolst[2] ^= True')
1 0 LOAD_NAME 0 (boolst)
2 LOAD_CONST 0 (2)
4 DUP_TOP_TWO
6 BINARY_SUBSCR
8 LOAD_CONST 1 (True)
10 INPLACE_XOR
12 ROT_THREE
14 STORE_SUBSCR
16 LOAD_CONST 2 (None)
18 RETURN_VALUE
Related
I am converting a list to set and back to list. I know that set(list) takes O(n) time, but I am converting it back to list in same line list(set(list)). Since both these operations take O(n) time, would the time complexity be O(n^2) now?
Logic 1:
final = list(set(list1)-set(list2))
Logic 2:
s = set(list1)-set(list2)
final = list(s)
Do both these implementations have different time complexities, and if they do which of them is more efficient?
One set conversion is unnecessary, as set.difference works with any iterable as it should:
final = list(set(list1).difference(list2))
But the asymptotic time and space complexity of the whole thing is still O(m+n) (the sizes of the two lists).
In both versions you are doing:
convert a list to a set
convert another list to a set
subtract the two sets
convert the resultant set to a list
Each of those is O(n), where n is a bound on the size of both your starting lists.
That's O(n) four times, which makes it overall O(n).
Your two versions are essentially identical.
Your two versions are identical, as from dis:
>>> dis.dis("list(set(list1) - set(list2))")
1 0 LOAD_NAME 0 (list)
2 LOAD_NAME 1 (set)
4 LOAD_NAME 2 (list1)
6 CALL_FUNCTION 1
8 LOAD_NAME 1 (set)
10 LOAD_NAME 3 (list2)
12 CALL_FUNCTION 1
14 BINARY_SUBTRACT
16 CALL_FUNCTION 1
18 RETURN_VALUE
>>> dis.dis("s = set(list1) - set(list2);list(s)")
1 0 LOAD_NAME 0 (set)
2 LOAD_NAME 1 (list1)
4 CALL_FUNCTION 1
6 LOAD_NAME 0 (set)
8 LOAD_NAME 2 (list2)
10 CALL_FUNCTION 1
12 BINARY_SUBTRACT
14 STORE_NAME 3 (s)
16 LOAD_NAME 4 (list)
18 LOAD_NAME 3 (s)
20 CALL_FUNCTION 1
22 POP_TOP
24 LOAD_CONST 0 (None)
26 RETURN_VALUE
>>>
The only difference between these two is that the second one stores it to a variable s, so it has to LOAD_NAME the name s. And in the first code the list is called first, but in the second one get's called later.
But in #user2390182's answer, instead of a second loaded set LOAD_NAME, it loads the function name difference instead, which IMO is the most efficient one here:
>>> dis.dis("list(set(list1).difference(list2))")
1 0 LOAD_NAME 0 (list)
2 LOAD_NAME 1 (set)
4 LOAD_NAME 2 (list1)
6 CALL_FUNCTION 1
8 LOAD_METHOD 3 (difference)
10 LOAD_NAME 4 (list2)
12 CALL_METHOD 1
14 CALL_FUNCTION 1
16 RETURN_VALUE
>>>
In this trivial example, I want to factor out the i < 5 condition of a list comprehension into it's own function. I also want to eat my cake and have it too, and avoid the overhead of the CALL_FUNCTION bytecode/creating a new frame in the python virtual machine.
Is there any way to factor out the conditions inside of a list comprehension into a new function but somehow get a disassembled result that avoids the large overhead of CALL_FUNCTION?
import dis
import sys
import timeit
def my_filter(n):
return n < 5
def a():
# list comprehension with function call
return [i for i in range(10) if my_filter(i)]
def b():
# list comprehension without function call
return [i for i in range(10) if i < 5]
assert a() == b()
>>> sys.version_info[:]
(3, 6, 5, 'final', 0)
>>> timeit.timeit(a)
1.2616060493517098
>>> timeit.timeit(b)
0.685117881097812
>>> dis.dis(a)
3 0 LOAD_CONST 1 (<code object <listcomp> at 0x0000020F4890B660, file "<stdin>", line 3>)
# ...
>>> dis.dis(b)
3 0 LOAD_CONST 1 (<code object <listcomp> at 0x0000020F48A42270, file "<stdin>", line 3>)
# ...
# list comprehension with function call
# big overhead with that CALL_FUNCTION at address 12
>>> dis.dis(a.__code__.co_consts[1])
3 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_GLOBAL 0 (my_filter)
10 LOAD_FAST 1 (i)
12 CALL_FUNCTION 1
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
# list comprehension without function call
>>> dis.dis(b.__code__.co_consts[1])
3 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LOAD_CONST 0 (5)
12 COMPARE_OP 0 (<)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
I'm willing to take a hacky solution that I would never use in production, like somehow replacing the bytecode at run time.
In other words, is it possible to replace a's addresses 8, 10, and 12 with b's 8, 10, and 12 at runtime?
Consolidating all of the excellent answers in the comments into one.
As georg says, this sounds like you are looking for a way to inline a function or an expression, and there is no such thing in CPython attempts have been made: https://bugs.python.org/issue10399
Therefore, along the lines of "metaprogramming", you can build the lambda's inline and eval:
from typing import Callable
import dis
def b():
# list comprehension without function call
return [i for i in range(10) if i < 5]
def gen_list_comprehension(expr: str) -> Callable:
return eval(f"lambda: [i for i in range(10) if {expr}]")
a = gen_list_comprehension("i < 5")
dis.dis(a.__code__.co_consts[1])
print("=" * 10)
dis.dis(b.__code__.co_consts[1])
which when run under 3.7.6 gives:
6 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LOAD_CONST 0 (5)
12 COMPARE_OP 0 (<)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
==========
1 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LOAD_CONST 0 (5)
12 COMPARE_OP 0 (<)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
From a security standpoint "eval" is dangerous, athough here it is less so because what you can do inside a lambda. And what can be done in an IfExp expression is even more limited, but still dangerous like call a function that does evil things.
However, if you want the same effect that is more secure, instead of working with strings you can modify AST's. I find that a lot more cumbersome though.
A hybrid approach would be the call ast.parse() and check the result. For example:
import ast
def is_cond_str(s: str) -> bool:
try:
mod_ast = ast.parse(s)
expr_ast = isinstance(mod_ast.body[0])
if not isinstance(expr_ast, ast.Expr):
return False
compare_ast = expr_ast.value
if not isinstance(compare_ast, ast.Compare):
return False
return True
except:
return False
This is a little more secure, but there still may be evil functions in the condition so you could keep going. Again, I find this a little tedious.
Coming from the other direction of starting off with bytecode, there is my cross-version assembler; see https://pypi.org/project/xasm/
I need to assign values to a bunch of variables. If the value is None, the variable should stay put, but if there is a value, it should get assigned. The obvious way is
if v is not None:
x = v
but repeating this construct over and over again uglifies the code. Doing this works
x = v if v is not None else x
but it does an unnecessary assignment operation and this is a frequently executed code path.
Is there a better way? Or does python optimize something like this and there is no assignment?
Using the dis module, we can examine the assembled python.
import dis
def a(v):
if v is not None:
x = v
def b(v):
x = v if v is not None else x
It would appear that the second method is actually slightly faster, although extremely marginally.
>>> dis.dis(a)
2 0 LOAD_FAST 0 (v)
2 LOAD_CONST 0 (None)
4 COMPARE_OP 9 (is not)
6 POP_JUMP_IF_FALSE 12
8 LOAD_FAST 0 (v)
10 JUMP_FORWARD 2 (to 14)
>> 12 LOAD_FAST 1 (x)
>> 14 STORE_FAST 1 (x)
16 LOAD_CONST 0 (None)
18 RETURN_VALUE
>>> dis.dis(b)
4 0 LOAD_FAST 0 (v)
2 LOAD_CONST 0 (None)
4 COMPARE_OP 9 (is not)
6 POP_JUMP_IF_FALSE 12
5 8 LOAD_FAST 0 (v)
10 STORE_FAST 1 (x)
>> 12 LOAD_CONST 0 (None)
14 RETURN_VALUE
That being said, pick whatever is more readable, or more accepted. I don't think two instructions is noticeable on any scale.
I have the following codes,
In [4]: def foo():
...: a = 2
...: b = 3
...: return a + b
...:
...:
In [5]: import dis
In [6]: dis.dis(foo)
2 0 LOAD_CONST 1 (2)
2 STORE_FAST 0 (a)
3 4 LOAD_CONST 2 (3)
6 STORE_FAST 1 (b)
4 8 LOAD_FAST 0 (a)
10 LOAD_FAST 1 (b)
12 BINARY_ADD
14 RETURN_VALUE
Reference to the bytecodes:
I know that:
the first column is line-number: 2, 3, 4
the third column is op-names: LOAD_CONST etc
the fifth column is the codes: (2) (a)
How about the second column: 0, 2 4, 6, 8...
and the fourth column 1, 0, 2, 1
Could you please provide a hint to find related info?
The second column is the bytecode byte index; each bytecode consists of 2 bytes (one indicating the exact opcode, the other the opcode argument value). It is actually column #4; there are two columns with no current value in your output.
For your function, you can find the bytestring that contains the bytecode as the __code__.co_code attribute:
>>> foo.__code__.co_code
b'd\x01}\x00d\x02}\x01|\x00|\x01\x17\x00S\x00'
So b'd\x01' is LOAD_CONST 1, b'}\x00' is STORE_FAST 0, etc.
This is documented under the dis.disco() function:
The output is divided in the following columns:
the line number, for the first instruction of each line
the current instruction, indicated as -->,
a labelled instruction, indicated with >>,
the address of the instruction,
the operation code name,
operation parameters, and
interpretation of the parameters in parentheses.
When you use dis.dis(), column #2 (current instruction) will always be empty.
Column #3, the labelled instruction, is used whenever there's a loop or test. For example:
>>> dis.dis('if foo:\n for i in it:\n print(i)\nelse: print(bar)')
1 0 LOAD_NAME 0 (foo)
2 POP_JUMP_IF_FALSE 28
2 4 SETUP_LOOP 30 (to 36)
6 LOAD_NAME 1 (it)
8 GET_ITER
>> 10 FOR_ITER 12 (to 24)
12 STORE_NAME 2 (i)
3 14 LOAD_NAME 3 (print)
16 LOAD_NAME 2 (i)
18 CALL_FUNCTION 1
20 POP_TOP
22 JUMP_ABSOLUTE 10
>> 24 POP_BLOCK
26 JUMP_FORWARD 8 (to 36)
4 >> 28 LOAD_NAME 3 (print)
30 LOAD_NAME 4 (bar)
32 CALL_FUNCTION 1
34 POP_TOP
>> 36 LOAD_CONST 0 (None)
38 RETURN_VALUE
There are 4 jump targets, where several opcodes can trigger a jump to one of those positions. They serve as a visual marker to ease reading.
As user2357112 pointed out, the original performance test script didn't worked as expected by me. "after the first execution of s1, your list has no 1s in it, so no further executions of s1 and no executions of s2 actually take the x==1 branch."
The modified version:
import timeit
import random
random.seed(0)
a = [ random.randrange(10) for _ in range(10000)]
change_from = 1
change_to = 6
setup = "from __main__ import a, change_from, change_to"
# s1 is replaced for a simple for loop, which is faster than the original
s1 = """\
for i,x in enumerate(a):
if x == change_from:
a[i] = change_to
change_from, change_to = change_to, change_from
"""
s2 = """\
a = [change_to if x==change_from else x for x in a]
change_from, change_to = change_to, change_from
"""
print(timeit.timeit(stmt=s1,number=10000, setup=setup))
print(timeit.timeit(stmt=s2, number=10000, setup=setup))
This script replaces every occurrences of 1 to 6, and the next run every occurrences of 6 to 1. And so on. The result is:
7.841739330212443
5.5166219217914065
Why is the list comprehension faster?
And how should one figure out this kind of question?
boardrider's comment looks interesting, thanks.
The following python version is used:
Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32
Since I didn't got a detailed answer I tried to figure it out.
If I represent the simple for loop with this function:
def func1(a):
for i,x in enumerate(a):
if x == 1:
a[i] = 6
return(a)
and disassemble it, i got the following:
func1:
7 0 SETUP_LOOP 48 (to 51)
3 LOAD_GLOBAL 0 (enumerate)
6 LOAD_FAST 0 (a)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 GET_ITER
>> 13 FOR_ITER 34 (to 50)
16 UNPACK_SEQUENCE 2
19 STORE_FAST 1 (i)
22 STORE_FAST 2 (x)
8 25 LOAD_FAST 2 (x)
28 LOAD_CONST 1 (1)
31 COMPARE_OP 2 (==)
34 POP_JUMP_IF_FALSE 13
9 37 LOAD_CONST 2 (6)
40 LOAD_FAST 0 (a)
43 LOAD_FAST 1 (i)
46 STORE_SUBSCR
47 JUMP_ABSOLUTE 13
>> 50 POP_BLOCK
10 >> 51 LOAD_FAST 0 (a)
54 RETURN_VALUE
This is simple. It iterates through the a, and if it finds the value 1 it replaces it with 6 with STORE_SUBSCR.
If I represent the comprehension variant with this function:
def func2(a):
a = [6 if x==1 else x for x in a]
return(a)
and disassemble it, I got the following:
func2:
7 0 LOAD_CONST 1 (<code object <listcomp> at 0x00000000035731E0, file "<file_path>", line 7>)
3 LOAD_CONST 2 ('func2.<locals>.<listcomp>')
6 MAKE_FUNCTION 0
9 LOAD_FAST 0 (a)
12 GET_ITER
13 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
16 STORE_FAST 0 (a)
8 19 LOAD_FAST 0 (a)
22 RETURN_VALUE
This is shorter than the previous. However it starts with a code object loading. The func2 has the following code constants:
>>> func2.__code__.co_consts
(None, <code object <listcomp> at 0x00000000035731E0, file "<file_path>", line 7>, 'func2.<locals>.<listcomp>')
and the listcomp code object looks like this:
>>> dis.dis(func2.__code__.co_consts[1].co_code)
0 BUILD_LIST 0
3 LOAD_FAST 0 (0)
>> 6 FOR_ITER 30 (to 39)
9 STORE_FAST 1 (1)
12 LOAD_FAST 1 (1)
15 LOAD_CONST 0 (0)
18 COMPARE_OP 2 (==)
21 POP_JUMP_IF_FALSE 30
24 LOAD_CONST 1 (1)
27 JUMP_FORWARD 3 (to 33)
>> 30 LOAD_FAST 1 (1)
>> 33 LIST_APPEND 2
36 JUMP_ABSOLUTE 6
>> 39 RETURN_VALUE
So essentially the two implementation performs the similar steps. The main difference is that the comprehension version replaces the FOR_ITER with a CALL_FUNCTION.
From this I should see why the list comprehension is faster, but I don't.
So my original question is still on:
Why is the list comprehension faster?