Given the following:
def foo():
x = a_method_returning_a_long_list()
y = a_method_which_filters_a_list(x)
return y
will Python's bytecode compiler keep x & y in memory, or is it clever enough to reduce it to the following?
def foo():
return a_method_which_filters_a_list(a_method_returning_a_long_list())
It keeps x and y in memory:
import dis
dis.dis(foo)
2 0 LOAD_GLOBAL 0 (a_method_returning_a_long_list)
3 CALL_FUNCTION 0
6 STORE_FAST 0 (x)
3 9 LOAD_GLOBAL 1 (a_method_which_filters_a_list)
12 LOAD_FAST 0 (x)
15 CALL_FUNCTION 1
18 STORE_FAST 1 (y)
4 21 LOAD_FAST 1 (y)
24 RETURN_VALUE
The whole operation is quite efficient, as it is done using the LOAD_FAST and STORE_FAST codes.
As Roadrunner-EX remarks in one of the comments, the amount of memory used by your two versions of foo is basically the same, as x and y are just references (i.e., pointers) to the results.
In [1]: import dis
In [2]: def f():
...: x = f1()
...: y = f2(x)
...: return y
...:
In [3]: dis.dis(f)
2 0 LOAD_GLOBAL 0 (f1)
3 CALL_FUNCTION 0
6 STORE_FAST 0 (x)
3 9 LOAD_GLOBAL 1 (f2)
12 LOAD_FAST 0 (x)
15 CALL_FUNCTION 1
18 STORE_FAST 1 (y)
4 21 LOAD_FAST 1 (y)
24 RETURN_VALUE
So it looks like both variables are held separately.
I'm not certain, but I would guess it would keep them in memory, for 2 reasons. First, it's probably more effort than its worth to do that. There wouldn't be a huge performance change either way. And second, the variables x and y are probably themselves taking up memory (in the form of pointers/references), which the compiler would not touch, due to the explicit nature of the assignment.
Related
I want to ask that is using magic methods( like int.__add__()) is quicker than using operators (like +) ?
will it make a difference even by a bit?
thanks.
Here is the disassembled byte code for 3 different ways of adding.
import dis
def add1(a, b):
return a + b
dis.dis(add1)
2 0 LOAD_FAST 0 (a)
2 LOAD_FAST 1 (b)
4 BINARY_ADD
6 RETURN_VALUE
def add2(a, b):
return a.__add__(b)
dis.dis(add2)
2 0 LOAD_FAST 0 (a)
2 LOAD_ATTR 0 (__add__)
4 LOAD_FAST 1 (b)
6 CALL_FUNCTION 1
8 RETURN_VALUE
def add3(a, b):
return int.__add__(a, b)
dis.dis(add3)
2 0 LOAD_GLOBAL 0 (int)
2 LOAD_ATTR 1 (__add__)
4 LOAD_FAST 0 (a)
6 LOAD_FAST 1 (b)
8 CALL_FUNCTION 2
10 RETURN_VALUE
a+b generates the simplest byte code, but I expect that the interpreter's code for BINARY_ADD simply calls the first arguments's __add__() method, so it's effectively the same as a.__add__(b).
int.__add__(a, b) looks like it might be faster because it doesn't have to find the method for a specific object, but looking up the int.__add__ attribute may be just as expensive.
If you really want to find out which is best, I suggest you run benchmarks.
In this trivial example, I want to factor out the i < 5 condition of a list comprehension into it's own function. I also want to eat my cake and have it too, and avoid the overhead of the CALL_FUNCTION bytecode/creating a new frame in the python virtual machine.
Is there any way to factor out the conditions inside of a list comprehension into a new function but somehow get a disassembled result that avoids the large overhead of CALL_FUNCTION?
import dis
import sys
import timeit
def my_filter(n):
return n < 5
def a():
# list comprehension with function call
return [i for i in range(10) if my_filter(i)]
def b():
# list comprehension without function call
return [i for i in range(10) if i < 5]
assert a() == b()
>>> sys.version_info[:]
(3, 6, 5, 'final', 0)
>>> timeit.timeit(a)
1.2616060493517098
>>> timeit.timeit(b)
0.685117881097812
>>> dis.dis(a)
3 0 LOAD_CONST 1 (<code object <listcomp> at 0x0000020F4890B660, file "<stdin>", line 3>)
# ...
>>> dis.dis(b)
3 0 LOAD_CONST 1 (<code object <listcomp> at 0x0000020F48A42270, file "<stdin>", line 3>)
# ...
# list comprehension with function call
# big overhead with that CALL_FUNCTION at address 12
>>> dis.dis(a.__code__.co_consts[1])
3 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_GLOBAL 0 (my_filter)
10 LOAD_FAST 1 (i)
12 CALL_FUNCTION 1
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
# list comprehension without function call
>>> dis.dis(b.__code__.co_consts[1])
3 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LOAD_CONST 0 (5)
12 COMPARE_OP 0 (<)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
I'm willing to take a hacky solution that I would never use in production, like somehow replacing the bytecode at run time.
In other words, is it possible to replace a's addresses 8, 10, and 12 with b's 8, 10, and 12 at runtime?
Consolidating all of the excellent answers in the comments into one.
As georg says, this sounds like you are looking for a way to inline a function or an expression, and there is no such thing in CPython attempts have been made: https://bugs.python.org/issue10399
Therefore, along the lines of "metaprogramming", you can build the lambda's inline and eval:
from typing import Callable
import dis
def b():
# list comprehension without function call
return [i for i in range(10) if i < 5]
def gen_list_comprehension(expr: str) -> Callable:
return eval(f"lambda: [i for i in range(10) if {expr}]")
a = gen_list_comprehension("i < 5")
dis.dis(a.__code__.co_consts[1])
print("=" * 10)
dis.dis(b.__code__.co_consts[1])
which when run under 3.7.6 gives:
6 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LOAD_CONST 0 (5)
12 COMPARE_OP 0 (<)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
==========
1 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LOAD_CONST 0 (5)
12 COMPARE_OP 0 (<)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
From a security standpoint "eval" is dangerous, athough here it is less so because what you can do inside a lambda. And what can be done in an IfExp expression is even more limited, but still dangerous like call a function that does evil things.
However, if you want the same effect that is more secure, instead of working with strings you can modify AST's. I find that a lot more cumbersome though.
A hybrid approach would be the call ast.parse() and check the result. For example:
import ast
def is_cond_str(s: str) -> bool:
try:
mod_ast = ast.parse(s)
expr_ast = isinstance(mod_ast.body[0])
if not isinstance(expr_ast, ast.Expr):
return False
compare_ast = expr_ast.value
if not isinstance(compare_ast, ast.Compare):
return False
return True
except:
return False
This is a little more secure, but there still may be evil functions in the condition so you could keep going. Again, I find this a little tedious.
Coming from the other direction of starting off with bytecode, there is my cross-version assembler; see https://pypi.org/project/xasm/
I need to assign values to a bunch of variables. If the value is None, the variable should stay put, but if there is a value, it should get assigned. The obvious way is
if v is not None:
x = v
but repeating this construct over and over again uglifies the code. Doing this works
x = v if v is not None else x
but it does an unnecessary assignment operation and this is a frequently executed code path.
Is there a better way? Or does python optimize something like this and there is no assignment?
Using the dis module, we can examine the assembled python.
import dis
def a(v):
if v is not None:
x = v
def b(v):
x = v if v is not None else x
It would appear that the second method is actually slightly faster, although extremely marginally.
>>> dis.dis(a)
2 0 LOAD_FAST 0 (v)
2 LOAD_CONST 0 (None)
4 COMPARE_OP 9 (is not)
6 POP_JUMP_IF_FALSE 12
8 LOAD_FAST 0 (v)
10 JUMP_FORWARD 2 (to 14)
>> 12 LOAD_FAST 1 (x)
>> 14 STORE_FAST 1 (x)
16 LOAD_CONST 0 (None)
18 RETURN_VALUE
>>> dis.dis(b)
4 0 LOAD_FAST 0 (v)
2 LOAD_CONST 0 (None)
4 COMPARE_OP 9 (is not)
6 POP_JUMP_IF_FALSE 12
5 8 LOAD_FAST 0 (v)
10 STORE_FAST 1 (x)
>> 12 LOAD_CONST 0 (None)
14 RETURN_VALUE
That being said, pick whatever is more readable, or more accepted. I don't think two instructions is noticeable on any scale.
In matlab, if we have a function that returns multiple variables, we do something like
[output1, output2] = some_func()
In Python, you can simply do
output1, output2 = some_func()
Or you could do
[output1, output2] = some_func()
Or
(output1, output2) = some_func()
The last 2 makes some temporary list and tuple, respectively, but it is not assigned to anything, and you can access the 2 output variables identical to the case without [] or (). Is there anything actually functionally advantageous to using the last 2 syntax besides somewhat looking a little more elegant?
The only difference I can think of between
[output1, output2] = some_func()
and
(output1, output2) = some_func()
is that the latter's memory footprint should be smaller (but I'm sure that this is an implementation detail) since tuples take less memory than lists with the same number of elements, mainly because tuples are immutable (so the interpreter should not need to worry about adding or removing [ie re-allocating memory] elements).
import sys
print(sys.getsizeof([1, 2]))
print(sys.getsizeof((1, 2)))
print(sys.getsizeof([1, 2, 3, 4]))
print(sys.getsizeof((1, 2, 3, 4)))
print(sys.getsizeof(list(range(1000))))
print(sys.getsizeof(tuple(range(1000))))
# 80
# 64
# 96
# 80
# 9112
# 8048
The generated bytecode is exactly the same for all 3 examples:
from dis import dis
def foo(): return 1, 2
def a():
output1, output2 = foo()
def b():
[output1, output2] = foo()
def c():
(output1, output2) = foo()
dis(a)
print('-----------------------------------------------------')
dis(b)
print('-----------------------------------------------------')
dis(c)
outputs
81 0 LOAD_GLOBAL 0 (foo)
2 CALL_FUNCTION 0
4 UNPACK_SEQUENCE 2
6 STORE_FAST 0 (output1)
8 STORE_FAST 1 (output2)
10 LOAD_CONST 0 (None)
12 RETURN_VALUE
-----------------------------------------------------
85 0 LOAD_GLOBAL 0 (foo)
2 CALL_FUNCTION 0
4 UNPACK_SEQUENCE 2
6 STORE_FAST 0 (output1)
8 STORE_FAST 1 (output2)
10 LOAD_CONST 0 (None)
12 RETURN_VALUE
-----------------------------------------------------
89 0 LOAD_GLOBAL 0 (foo)
2 CALL_FUNCTION 0
4 UNPACK_SEQUENCE 2
6 STORE_FAST 0 (output1)
8 STORE_FAST 1 (output2)
10 LOAD_CONST 0 (None)
12 RETURN_VALUE
In Python, while assigning a value to a variable, we can either do:
variable = variable + 20
or
variable += 20.
While I do understand that both the operations are semantically same, i.e., they achieve the same goal of increasing the previous value of variable by 20, I was wondering if there are subtle run-time performance differences between the two, or any other slight differences which might deem one better than the other.
Is there any such difference, or are they exactly the same?
If there is any difference, is it the same for other languages such as C++?
Thanks.
Perhaps this can help you understand better:
import dis
def a():
x = 0
x += 20
return x
def b():
x = 0
x = x + 20
return x
print 'In place add'
dis.dis(a)
print 'Binary add'
dis.dis(b)
We get the following outputs:
In place add
4 0 LOAD_CONST 1 (0)
3 STORE_FAST 0 (x)
5 6 LOAD_FAST 0 (x)
9 LOAD_CONST 2 (20)
12 INPLACE_ADD
13 STORE_FAST 0 (x)
6 16 LOAD_FAST 0 (x)
19 RETURN_VALUE
Binary add
9 0 LOAD_CONST 1 (0)
3 STORE_FAST 0 (x)
10 6 LOAD_FAST 0 (x)
9 LOAD_CONST 2 (20)
12 BINARY_ADD
13 STORE_FAST 0 (x)
11 16 LOAD_FAST 0 (x)
19 RETURN_VALUE
You could do a loop a thousand or so times using a timer to compare perfomance, but the main difference is that one. I suppose binary add should be faster tho.