Line number of python Disassembler(dis) - python

I recently learn python.
I am wondering what is the line number for dis function.
import dis
def add(a, b):
a += 1
return a+b
dis.dis(add)
.
3 0 LOAD_FAST 0 (a)
2 LOAD_CONST 1 (1)
4 INPLACE_ADD
6 STORE_FAST 0 (a)
4 8 LOAD_FAST 0 (a)
10 LOAD_FAST 1 (b)
12 BINARY_ADD
14 RETURN_VALUE
There are 3 and 4.
Where is the line 1 and line 2 ?

With dis.dis(add) you disassembly only your function add. So there a only two lines. The function add have only two lines too.

Related

Is using magic methods quicker than using operator in python?

I want to ask that is using magic methods( like int.__add__()) is quicker than using operators (like +) ?
will it make a difference even by a bit?
thanks.
Here is the disassembled byte code for 3 different ways of adding.
import dis
def add1(a, b):
return a + b
dis.dis(add1)
2 0 LOAD_FAST 0 (a)
2 LOAD_FAST 1 (b)
4 BINARY_ADD
6 RETURN_VALUE
def add2(a, b):
return a.__add__(b)
dis.dis(add2)
2 0 LOAD_FAST 0 (a)
2 LOAD_ATTR 0 (__add__)
4 LOAD_FAST 1 (b)
6 CALL_FUNCTION 1
8 RETURN_VALUE
def add3(a, b):
return int.__add__(a, b)
dis.dis(add3)
2 0 LOAD_GLOBAL 0 (int)
2 LOAD_ATTR 1 (__add__)
4 LOAD_FAST 0 (a)
6 LOAD_FAST 1 (b)
8 CALL_FUNCTION 2
10 RETURN_VALUE
a+b generates the simplest byte code, but I expect that the interpreter's code for BINARY_ADD simply calls the first arguments's __add__() method, so it's effectively the same as a.__add__(b).
int.__add__(a, b) looks like it might be faster because it doesn't have to find the method for a specific object, but looking up the int.__add__ attribute may be just as expensive.
If you really want to find out which is best, I suggest you run benchmarks.

Branchless method to convert false/true to -1/+1? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
What is a branchless way to do the following mapping?
true -> +1
false -> -1
An easy way would be if b then 1 else -1 but I'm looking for a method to avoid the branch, i.e. if.
If it is relevant, I'm using Python.
Here's a comparison of the solutions posted in comments and answers so far.
We can use the dis module to see the generated bytecode in each case; this confirms that there are no conditional jump instructions (in the Python code itself, at least), and also tells us something about the expected performance, since the number of opcodes executed has a direct impact on that (though they are not perfectly correlated). The number of function calls is also relevant for performance, since these have a particularly high overhead.
#Glannis Clipper and #kaya3: (-1, 1)[b] (3 opcodes)
1 0 LOAD_CONST 2 ((-1, 1))
3 LOAD_NAME 0 (b)
6 BINARY_SUBSCR
#HeapOverflow: -(-1)**b (4 opcodes)
1 0 LOAD_CONST 0 (-1)
2 LOAD_NAME 0 (b)
4 BINARY_POWER
6 UNARY_NEGATIVE
#HeapOverflow: b - (not b) (4 opcodes)
1 0 LOAD_NAME 0 (b)
2 LOAD_NAME 0 (b)
4 UNARY_NOT
6 BINARY_SUBTRACT
#kaya3: 2 * b - 1 (5 opcodes)
1 0 LOAD_CONST 0 (2)
3 LOAD_NAME 0 (b)
6 BINARY_MULTIPLY
7 LOAD_CONST 1 (1)
10 BINARY_SUBTRACT
#HeapOverflow: ~b ^ -b (5 opcodes)
1 0 LOAD_NAME 0 (b)
2 UNARY_INVERT
4 LOAD_NAME 0 (b)
6 UNARY_NEGATIVE
8 BINARY_XOR
#Mark Meyer: b - (b - 1) * -1 (7 opcodes)
1 0 LOAD_NAME 0 (b)
3 LOAD_NAME 0 (b)
6 LOAD_CONST 0 (1)
9 BINARY_SUBTRACT
10 LOAD_CONST 1 (-1)
13 BINARY_MULTIPLY
14 BINARY_SUBTRACT
#Sayse: {True: 1, False: -1}[b] (7 opcodes)
1 0 LOAD_CONST 0 (True)
3 LOAD_CONST 1 (1)
6 LOAD_CONST 2 (False)
9 LOAD_CONST 3 (-1)
12 BUILD_MAP 2
15 LOAD_NAME 0 (b)
18 BINARY_SUBSCR
#deceze: {True: 1}.get(b, -1) (7 opcodes, 1 function call)
1 0 LOAD_CONST 0 (True)
3 LOAD_CONST 1 (1)
6 BUILD_MAP 1
9 LOAD_ATTR 0 (get)
12 LOAD_NAME 1 (b)
15 LOAD_CONST 2 (-1)
18 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
#Glannis Clipper: [-1, 1][int(b)] (7 opcodes, 1 function call)
1 0 LOAD_CONST 1 (-1)
3 LOAD_CONST 0 (1)
6 BUILD_LIST 2
9 LOAD_NAME 0 (int)
12 LOAD_NAME 1 (b)
15 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
18 BINARY_SUBSCR
#divyang4481: 2 * int(b) - 1 (7 opcodes, 1 function call)
1 0 LOAD_CONST 0 (2)
3 LOAD_NAME 0 (int)
6 LOAD_NAME 1 (b)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 BINARY_MULTIPLY
13 LOAD_CONST 1 (1)
16 BINARY_SUBTRACT
You can exploit the fact that in Python, the type bool is numeric:
>>> True == 1
True
>>> False == 0
True
So the expression 2 * b - 1 gives the desired results:
>>> def without_branching(b):
... return 2 * b - 1
...
>>> without_branching(True)
1
>>> without_branching(False)
-1
However, it's arguable whether even this is really "branchless". It will be compiled to Python bytecode with no conditional jumps, but the bytecode interpreter will certainly do some conditional jumps in order to execute it: at the very least, it has to check which opcodes to execute, what types the operands of * and - have, and so on.
Maybe we can use a list in a way like this:
[None, True, False][1]
# output True
[None, True, False][-1]
# output False
UPDATE: And the opposite way as mentioned in comments:
[-1, 1][int(False)]
# output -1
[-1, 1][int(True)]
# output 1
UPDATE: Or even simpler with the use of a tuple and without the need of int() conversion (as mentioned in comments too):
(-1, 1)[False]
# output -1
(-1, 1)[True]
# output 1

Is it possible to call a function from within a list comprehension without the overhead of calling the function?

In this trivial example, I want to factor out the i < 5 condition of a list comprehension into it's own function. I also want to eat my cake and have it too, and avoid the overhead of the CALL_FUNCTION bytecode/creating a new frame in the python virtual machine.
Is there any way to factor out the conditions inside of a list comprehension into a new function but somehow get a disassembled result that avoids the large overhead of CALL_FUNCTION?
import dis
import sys
import timeit
def my_filter(n):
return n < 5
def a():
# list comprehension with function call
return [i for i in range(10) if my_filter(i)]
def b():
# list comprehension without function call
return [i for i in range(10) if i < 5]
assert a() == b()
>>> sys.version_info[:]
(3, 6, 5, 'final', 0)
>>> timeit.timeit(a)
1.2616060493517098
>>> timeit.timeit(b)
0.685117881097812
>>> dis.dis(a)
3 0 LOAD_CONST 1 (<code object <listcomp> at 0x0000020F4890B660, file "<stdin>", line 3>)
# ...
>>> dis.dis(b)
3 0 LOAD_CONST 1 (<code object <listcomp> at 0x0000020F48A42270, file "<stdin>", line 3>)
# ...
# list comprehension with function call
# big overhead with that CALL_FUNCTION at address 12
>>> dis.dis(a.__code__.co_consts[1])
3 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_GLOBAL 0 (my_filter)
10 LOAD_FAST 1 (i)
12 CALL_FUNCTION 1
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
# list comprehension without function call
>>> dis.dis(b.__code__.co_consts[1])
3 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LOAD_CONST 0 (5)
12 COMPARE_OP 0 (<)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
I'm willing to take a hacky solution that I would never use in production, like somehow replacing the bytecode at run time.
In other words, is it possible to replace a's addresses 8, 10, and 12 with b's 8, 10, and 12 at runtime?
Consolidating all of the excellent answers in the comments into one.
As georg says, this sounds like you are looking for a way to inline a function or an expression, and there is no such thing in CPython attempts have been made: https://bugs.python.org/issue10399
Therefore, along the lines of "metaprogramming", you can build the lambda's inline and eval:
from typing import Callable
import dis
def b():
# list comprehension without function call
return [i for i in range(10) if i < 5]
def gen_list_comprehension(expr: str) -> Callable:
return eval(f"lambda: [i for i in range(10) if {expr}]")
a = gen_list_comprehension("i < 5")
dis.dis(a.__code__.co_consts[1])
print("=" * 10)
dis.dis(b.__code__.co_consts[1])
which when run under 3.7.6 gives:
6 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LOAD_CONST 0 (5)
12 COMPARE_OP 0 (<)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
==========
1 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LOAD_CONST 0 (5)
12 COMPARE_OP 0 (<)
14 POP_JUMP_IF_FALSE 4
16 LOAD_FAST 1 (i)
18 LIST_APPEND 2
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
From a security standpoint "eval" is dangerous, athough here it is less so because what you can do inside a lambda. And what can be done in an IfExp expression is even more limited, but still dangerous like call a function that does evil things.
However, if you want the same effect that is more secure, instead of working with strings you can modify AST's. I find that a lot more cumbersome though.
A hybrid approach would be the call ast.parse() and check the result. For example:
import ast
def is_cond_str(s: str) -> bool:
try:
mod_ast = ast.parse(s)
expr_ast = isinstance(mod_ast.body[0])
if not isinstance(expr_ast, ast.Expr):
return False
compare_ast = expr_ast.value
if not isinstance(compare_ast, ast.Compare):
return False
return True
except:
return False
This is a little more secure, but there still may be evil functions in the condition so you could keep going. Again, I find this a little tedious.
Coming from the other direction of starting off with bytecode, there is my cross-version assembler; see https://pypi.org/project/xasm/

What's the difference between in-place assignment and assignment using the variable's name again?

In Python, while assigning a value to a variable, we can either do:
variable = variable + 20
or
variable += 20.
While I do understand that both the operations are semantically same, i.e., they achieve the same goal of increasing the previous value of variable by 20, I was wondering if there are subtle run-time performance differences between the two, or any other slight differences which might deem one better than the other.
Is there any such difference, or are they exactly the same?
If there is any difference, is it the same for other languages such as C++?
Thanks.
Perhaps this can help you understand better:
import dis
def a():
x = 0
x += 20
return x
def b():
x = 0
x = x + 20
return x
print 'In place add'
dis.dis(a)
print 'Binary add'
dis.dis(b)
We get the following outputs:
In place add
4 0 LOAD_CONST 1 (0)
3 STORE_FAST 0 (x)
5 6 LOAD_FAST 0 (x)
9 LOAD_CONST 2 (20)
12 INPLACE_ADD
13 STORE_FAST 0 (x)
6 16 LOAD_FAST 0 (x)
19 RETURN_VALUE
Binary add
9 0 LOAD_CONST 1 (0)
3 STORE_FAST 0 (x)
10 6 LOAD_FAST 0 (x)
9 LOAD_CONST 2 (20)
12 BINARY_ADD
13 STORE_FAST 0 (x)
11 16 LOAD_FAST 0 (x)
19 RETURN_VALUE
You could do a loop a thousand or so times using a timer to compare perfomance, but the main difference is that one. I suppose binary add should be faster tho.

Fastest way to swap elements in Python list

Is there any any faster way to swap two list elements in Python than
L[a], L[b] = L[b], L[a]
or would I have to resort to Cython or Weave or the like?
Looks like the Python compiler optimizes out the temporary tuple with this construct:
code:
import dis
def swap1():
a=5
b=4
a, b = b, a
def swap2():
a=5
b=4
c = a
a = b
b = c
print 'swap1():'
dis.dis(swap1)
print 'swap2():'
dis.dis(swap2)
output:
swap1():
6 0 LOAD_CONST 1 (5)
3 STORE_FAST 0 (a)
7 6 LOAD_CONST 2 (4)
9 STORE_FAST 1 (b)
8 12 LOAD_FAST 1 (b)
15 LOAD_FAST 0 (a)
18 ROT_TWO
19 STORE_FAST 0 (a)
22 STORE_FAST 1 (b)
25 LOAD_CONST 0 (None)
28 RETURN_VALUE
swap2():
11 0 LOAD_CONST 1 (5)
3 STORE_FAST 0 (a)
12 6 LOAD_CONST 2 (4)
9 STORE_FAST 1 (b)
13 12 LOAD_FAST 0 (a)
15 STORE_FAST 2 (c)
14 18 LOAD_FAST 1 (b)
21 STORE_FAST 0 (a)
15 24 LOAD_FAST 2 (c)
27 STORE_FAST 1 (b)
30 LOAD_CONST 0 (None)
33 RETURN_VALUE
Two loads, a ROT_TWO, and two saves, versus three loads and three saves. You are unlikely to find a faster mechanism.
If you could post a representative code sample, we could do a better job of benchmarking your options. FWIW, for the following dumb benchmark, I get about a 3x speedup with Shed Skin and a 10x speedup with PyPy.
from time import time
def swap(L):
for i in xrange(1000000):
for b, a in enumerate(L):
L[a], L[b] = L[b], L[a]
def main():
start = time()
L = list(reversed(range(100)))
swap(L[:])
print time() - start
return L
if __name__ == "__main__":
print len(main())
# for shedskin:
# shedskin -b -r -e listswap.py && make
# python -c "import listswap; print len(listswap.main())"
I try this method as the easiest way to swap two numbers in a list:
lst= [23, 65, 19, 90]
pos1 = lst.pop(0)
pos2 = lst.pop(1)
lst.append(pos1)
lst.append(pos2)
print(lst)
I found this method as the fastest way to swap two numbers:
mylist = [11,23,5,8,13,17];
first_el = mylist.pop(0)
last_el = mylist.pop(-1)
mylist.insert(0, last_el)
mylist.append(first_el)

Categories

Resources