Python Class Stack Frame Limitations - python

If I have a class of the following format:
class TestClass:
def __init__(self):
self.value = 0
def getNewObject(self):
return TestClass()
Is there a limitation to the amount of times I can call the function? For example:
obj = TestClass()
obj.getNewObject().getNewObject()
Is there a limitation to how many times I can call getNewObject() on the return value of getNewObject()? If so what factors affect this?

I doubt it. One reason that makes me doubt it is that if we have this function:
def test(obj):
obj.getNewObject().getNewObject().getNewObject()
And we disassemble it:
import dis
dis.dis(test)
We get this:
2 0 LOAD_FAST 0 (obj)
3 LOAD_ATTR 0 (getNewObject)
6 CALL_FUNCTION 0
9 LOAD_ATTR 0 (getNewObject)
12 CALL_FUNCTION 0
15 LOAD_ATTR 0 (getNewObject)
18 CALL_FUNCTION 0
21 POP_TOP
22 LOAD_CONST 0 (None)
25 RETURN_VALUE
That's just repetitions of LOAD_ATTR followed by CALL_FUNCTION. I can't imagine that that would require much memory or other resources to manage. As such, there is probably no limit.

There is a recursion limit in Python (adjustable), but that is unrelated. Each call is made after the previous call has completed, so they're all called from the same level of the stack frame (i.e. from your user code). Now, you might hit a line-length limit or something, especially for an interactive Python shell.

Related

Use type comments to narrow typing of already declared Python variable

How can one use type comments in Python to change or narrow the type of an already declared variable, in such a way as to make pycharm or other type-aware systems understand the new type.
For instance, I might have two classes:
class A:
is_b = False
...
class B(A):
is_b = True
def flummox(self):
return '?'
and another function elsewhere:
def do_something_to_A(a_in: A):
...
if a_in.is_b:
assert isinstance(a_in, B) # THIS IS THE LINE...
a_in.flummox()
As long as I have the assert statement, PyCharm will understand that I've narrowed a_in to be of class B, and not complain about .flummox(). Without it, errors/warnings such as a_in has no method flummox will appear.
The question I have is, is there a PEP 484 (or successor) way of showing that a_in (which might have originally been of type A or B or something else) is now of type B without having the assert statement. The statement b_in : B = a_in also gives type errors.
In TypeScript I could do something like this:
if a_in.is_b:
const b_in = <B><any> a_in;
b_in.flummox()
// or
if a_in.is_b:
(a_in as B).flummox()
There are two main reasons I don't want to use the assert line is (1) speed is very important to this part of code, and having an extra is_instance call for every time the line is run slows it down too much, and (2) a project code style that forbids bare assert statements.
So long as you are using Python 3.6+, you can "re-annotate" the type of a variable arbitrarily using the same syntax as you would use to "declare" the type of a variable without initializing it (PEP 526).
In the example you have provided, the following snippet has the behavior you expect:
def do_something_to_A(a_in: A):
...
if a_in.is_b:
a_in: B
a_in.flummox()
I have tested that this technique is properly detected by PyCharm 2019.2.
It is worth noting that this incurs no runtime cost since the same bytecode is generated with or without this added annotation statement. Given the following defintions,
def do_something_with_annotation(a_in: A):
if a_in.is_b:
a_in: B
a_in.flummox()
def do_something_without_annotation(a_in: A):
if a_in.is_b:
a_in.flummox()
dis produce the following bytecode:
>>> dis.dis(do_something_with_annotation)
3 0 LOAD_FAST 0 (a_in)
2 LOAD_ATTR 0 (is_b)
4 POP_JUMP_IF_FALSE 14
5 6 LOAD_FAST 0 (a_in)
8 LOAD_ATTR 1 (flummox)
10 CALL_FUNCTION 0
12 POP_TOP
>> 14 LOAD_CONST 0 (None)
16 RETURN_VALUE
>>> dis.dis(do_something_without_annotation)
3 0 LOAD_FAST 0 (a_in)
2 LOAD_ATTR 0 (is_b)
4 POP_JUMP_IF_FALSE 14
4 6 LOAD_FAST 0 (a_in)
8 LOAD_ATTR 1 (flummox)
10 CALL_FUNCTION 0
12 POP_TOP
>> 14 LOAD_CONST 0 (None)
16 RETURN_VALUE
As a side note, you could also keep the assertion statements and disable assertions in your production environment by invoking the interpreter with the -O flag. This may or may not be considered more readable by your colleagues, depending on their familiarity with type hinting in Python.

Python: tracing lines of source code that python function activates

I am aware of how to find the location of a python function in its source code with the inspect module via
import inspect
inspect.getsourcefile(random_function)
However, while a python function is running, or after it has run, how would one find all of the pieces of the source code it utilized/referenced during its individual run?
For ex., if I ran random_function(arg1=1, arg2=2) vs. random_function(arg1=1, arg5=3.5), I would like to know which different parts of the module got used each time.
Is there anything like the example here?
You can do this by getting the information of bytecode using dis module. For instance:
import dis
import numpy as np
def main():
array = np.zeros(shape = 5)
if __name__ == '__main__':
print(dis.dis(main))
result:
5 0 LOAD_GLOBAL 0 (np)
2 LOAD_ATTR 1 (zeros)
4 LOAD_CONST 1 (5)
6 LOAD_CONST 2 (('shape',))
8 CALL_FUNCTION_KW 1
10 STORE_FAST 0 (array)
12 LOAD_CONST 0 (None)
14 RETURN_VALUE

Is there an advantage of using the property decorator compared to the property class?

I can see two very similar ways of having properties in Python
(a) Property class
class Location(object):
def __init__(self, longitude, latitude):
self.set_latitude(latitude)
self.set_longitude(longitude)
def set_latitude(self, latitude):
if not (-90 <= latitude <= 90):
raise ValueError('latitude was {}, but has to be in [-90, 90]'
.format(latitude))
self._latitude = latitude
def set_longitude(self, longitude):
if not (-180 <= longitude <= 180):
raise ValueError('longitude was {}, but has to be in [-180, 180]'
.format(longitude))
self._longitude = longitude
def get_longitude(self):
return self._latitude
def get_latitude(self):
return self._longitude
latitude = property(get_latitude, set_latitude)
longitude = property(get_longitude, set_longitude)
(b) Property decorator
class Location(object):
def __init__(self, longitude, latitude):
self.latitude = latitude
self.longitude = latitude
#property
def latitude(self):
"""I'm the 'x' property."""
return self._latitude
#property
def longitude(self):
"""I'm the 'x' property."""
return self._longitude
#latitude.setter
def latitude(self, latitude):
if not (-90 <= latitude <= 90):
raise ValueError('latitude was {}, but has to be in [-90, 90]'
.format(latitude))
self._latitude = latitude
#longitude.setter
def longitude(self, longitude):
if not (-180 <= longitude <= 180):
raise ValueError('longitude was {}, but has to be in [-180, 180]'
.format(longitude))
self._longitude = longitude
Question
Are those two pieces of code identical (e.g. bytecode wise)? Do they show the same behavior?
Are there any official guides which "style" to use?
Are there any real advantages of one over the other?
What I've tried
py_compile + uncompyle6
I've compiled both:
>>> import py_compile
>>> py_compile.compile('test.py')
and then decompiled both with uncompyle6. But that just returned exactly what I started with (with a bit different formatting)
import + dis
I tried
import test # (a)
import test2 # (b)
dis.dis(test)
dis.dis(test2)
I'm super confused by the output of test2:
Disassembly of Location:
Disassembly of __init__:
13 0 LOAD_FAST 2 (latitude)
2 LOAD_FAST 0 (self)
4 STORE_ATTR 0 (latitude)
14 6 LOAD_FAST 2 (latitude)
8 LOAD_FAST 0 (self)
10 STORE_ATTR 1 (longitude)
12 LOAD_CONST 0 (None)
14 RETURN_VALUE
whereas the first one was way bigger:
Disassembly of Location:
Disassembly of __init__:
13 0 LOAD_FAST 0 (self)
2 LOAD_ATTR 0 (set_latitude)
4 LOAD_FAST 2 (latitude)
6 CALL_FUNCTION 1
8 POP_TOP
14 10 LOAD_FAST 0 (self)
12 LOAD_ATTR 1 (set_longitude)
14 LOAD_FAST 1 (longitude)
16 CALL_FUNCTION 1
18 POP_TOP
20 LOAD_CONST 0 (None)
22 RETURN_VALUE
Disassembly of set_latitude:
17 0 LOAD_CONST 3 (-90)
2 LOAD_FAST 1 (latitude)
4 DUP_TOP
6 ROT_THREE
8 COMPARE_OP 1 (<=)
10 JUMP_IF_FALSE_OR_POP 18
12 LOAD_CONST 1 (90)
14 COMPARE_OP 1 (<=)
16 JUMP_FORWARD 4 (to 22)
>> 18 ROT_TWO
20 POP_TOP
>> 22 POP_JUMP_IF_TRUE 38
18 24 LOAD_GLOBAL 0 (ValueError)
26 LOAD_CONST 2 ('latitude was {}, but has to be in [-90, 90]')
28 LOAD_ATTR 1 (format)
30 LOAD_FAST 1 (latitude)
32 CALL_FUNCTION 1
34 CALL_FUNCTION 1
36 RAISE_VARARGS 1
19 >> 38 LOAD_FAST 1 (latitude)
40 LOAD_FAST 0 (self)
42 STORE_ATTR 2 (latitude)
44 LOAD_CONST 0 (None)
46 RETURN_VALUE
Disassembly of set_longitude:
22 0 LOAD_CONST 3 (-180)
2 LOAD_FAST 1 (longitude)
4 DUP_TOP
6 ROT_THREE
8 COMPARE_OP 1 (<=)
10 JUMP_IF_FALSE_OR_POP 18
12 LOAD_CONST 1 (180)
14 COMPARE_OP 1 (<=)
16 JUMP_FORWARD 4 (to 22)
>> 18 ROT_TWO
20 POP_TOP
>> 22 POP_JUMP_IF_TRUE 38
23 24 LOAD_GLOBAL 0 (ValueError)
26 LOAD_CONST 2 ('longitude was {}, but has to be in [-180, 180]')
28 LOAD_ATTR 1 (format)
30 LOAD_FAST 1 (longitude)
32 CALL_FUNCTION 1
34 CALL_FUNCTION 1
36 RAISE_VARARGS 1
24 >> 38 LOAD_FAST 1 (longitude)
40 LOAD_FAST 0 (self)
42 STORE_ATTR 2 (longitude)
44 LOAD_CONST 0 (None)
46 RETURN_VALUE
Where does that difference come from? Where is the value range check for the first example?
You want to use the decorator, always. There is no advantage to the other syntax, and only disadvantages.
The point of decorators
That's because the decorator syntax was invented specifically to avoid the other syntax. Any examples you find of the name = property(...) variety is usually in code that predates decorators.
Decorator syntax is syntactic sugar; the form
#decorator
def functionname(...):
# ...
is executed a lot like
def functionname(...):
# ...
functionname = decorator(functionname)
without functionname being assigned to twice (the def functionname(...) part creates a function object and assigns to functionname normally, but with a decorator the function object is created and passed directly to the decorator object).
Python added this feature because when your function body is long, you can't easily see that the function has been wrapped with a decorator. You'd have to scroll down past the function definition to see that, and that's not very helpful when almost everything else you'd want to know about a function is right at the top; the arguments, the name, the docstring are right there.
From the original PEP 318 – Decorators for Functions and Methods specification:
The current method of applying a transformation to a function or method places the actual transformation after the function body. For large functions this separates a key component of the function's behavior from the definition of the rest of the function's external interface.
[...]
This becomes less readable with longer methods. It also seems less than pythonic to name the function three times for what is conceptually a single declaration.
and under Design Goals:
The new syntax should
[...]
move from the end of the function, where it's currently hidden, to the front where it is more in your face
So using
#property
def latitude(self):
# ...
#latitude.setter
def latitude(self, latitude):
# ...
is far more readable and self documenting than
def get_latitude(self):
# ...
def set_latitude(self, latitude):
# ...
latitude = property(get_latitude, set_latitude)
No namespace pollution
Next, because the #property decorator replaces the function object you decorate with the decoration result (a property instance), you also avoid namespace pollution. Without #property and #<name>.setter and #<name>.deleter, you have to add 3 extra, separate names to your class definition that then no-one will ever use:
>>> [n for n in sorted(vars(Location)) if n[:2] != '__']
['get_latitude', 'get_longitude', 'latitude', 'longitude', 'set_latitude', 'set_longitude']
Imagine a class with 5, or 10 or even more property definitions. Developers less familiar with the project and an auto-completing IDE will surely get confused by the difference between get_latitude, latitude and set_latitude, and you end up with code that mixes styles and makes it harder to now move away from exposing these methods at the class level.
Sure, you can use del get_latitude, set_latitude right after the latitude = property(...) assignment, but that's yet more extra code to execute for no real purpose.
Confusing method names
Although you can avoid having to prefix the accessor names with get_ and set_ or otherwise differentiate between the names to create a property() object from them, that's still how almost all code that doesn't use the #property decorator syntax ends up naming the accessor methods.
And that can lead to some confusion in tracebacks; an exception raised in one of the accessor methods leads to a traceback with get_latitude or set_latitude in the name, while the preceding line used object.latitude. It may not always be clear to the Python property novice how the two are connected, especially if they missed the latitude = property(...) line further down; see above.
Accessing to the accessors, how to inherit
You may point out that you may need access to those functions anyway; for example when overriding just the getter or a setter of for the property in a subclass, while inheriting the other accessor.
But the property object, when accessed on the class, already gives you references to the accessors, via the .fget, .fset and .fdel attributes:
>>> Location.latitude
<property object at 0x10d1c3d18>
>>> Location.latitude.fget
<function Location.get_latitude at 0x10d1c4488>
>>> Location.latitude.fset
<function Location.set_latitude at 0x10d195ea0>
and you can reuse the #<name>.getter / #<name>.setter / #<name>.deleter syntax in a subclass without having to remember to create a new property object!
With the old syntax, it was commonplace to try to override just one of the accessors:
class SpecialLocation(Location):
def set_latitude(self, latitude):
# ...
and then wonder why it would not be picked up by the inherited property object.
With the decorator syntax, you'd use:
class SpecialLocation(Location):
#Location.latitude.setter
def latitude(self, latitude):
# ...
and the SpecialLocation subclass then is given a new property() instance with the getter inherited from Location, and with a new setter.
TLDR
Use the decorator syntax.
It is self-documenting
It avoids namespace pollution
It makes inheriting accessors from properties cleaner and more straightforward
The results of the two versions of your code will be almost exactly the same. The property descriptor you have at the end will be functionally identical in both cases. The only difference in the descriptors will be in the function names you can access if you really try (via Location.longitude.fset.__name__), and that you might see in an exception traceback, if something goes wrong.
The only other difference is the presence of the get_foo and set_foo methods after you're done. When you use #property, you won't have those methods cluttering up the namespace. If you build the property object yourself manually, they will remain in the class namespace, and so you can call them directly if you really want to instead of using normal attribute access via the property object.
Unusually the #property syntax is better since it hides the methods, which you usually don't need. The only reason I can think of that you might want to expose them is if you expect to pass the methods as callbacks to some other function (e.g. some_function(*args, callback=foo.set_longitude)). You could just use a lambda though for the callback though (lambda x: setattr(foo, "longitude", x)), so I don't think it's worth polluting a nice API with extraneous getter and setter methods just for this corner case.

Is there a way to make dis.dis() print code objects recursively?

I've been using the dis module to observe CPython bytecode. But lately, I've noticed some inconvenient behavior of dis.dis().
Take this example for instance: I first define a function multiplier with a nested function inside of it inner:
>>> def multiplier(n):
def inner(multiplicand):
return multiplicand * n
return inner
>>>
I then use dis.dis() to disassemble it:
>>> from dis import dis
>>> dis(multiplier)
2 0 LOAD_CLOSURE 0 (n)
3 BUILD_TUPLE 1
6 LOAD_CONST 1 (<code object inner at 0x7ff6a31d84b0, file "<pyshell#12>", line 2>)
9 LOAD_CONST 2 ('multiplier.<locals>.inner')
12 MAKE_CLOSURE 0
15 STORE_FAST 1 (inner)
4 18 LOAD_FAST 1 (inner)
21 RETURN_VALUE
>>>
As you can see, it disassembled the top-level code object fine. However, it did not disassemble inner. It simply showed that it created a code object named inner and displayed the default (uninformative) __repr__() for code objects.
Is there a way I can make dis.dis() print the code objects recursively? That is, if I have nested code objects, it will print the bytecode for all of the code objects out, rather than stopping at the top-level code object. I'd mainly like this feature for things such as decorators, closures, or generator comprehensions.
It appears that the latest version of Python - 3.7 alpha 1 - has exactly the behavior I want from dis.dis():
>>> def func(a):
def ifunc(b):
return b + 10
return ifunc
>>> dis(func)
2 0 LOAD_CONST 1 (<code object ifunc at 0x7f199855ac90, file "python", line 2>)
2 LOAD_CONST 2 ('func.<locals>.ifunc')
4 MAKE_FUNCTION 0
6 STORE_FAST 1 (ifunc)
4 8 LOAD_FAST 1 (ifunc)
10 RETURN_VALUE
Disassembly of <code object ifunc at 0x7f199855ac90, file "python", line 2>:
3 0 LOAD_FAST 0 (b)
2 LOAD_CONST 1 (10)
4 BINARY_ADD
6 RETURN_VALUE
The What’s New In Python 3.7 article makes note of this:
The dis() function now is able to disassemble nested code objects (the code of comprehensions, generator expressions and nested functions, and the code used for building nested classes). (Contributed by Serhiy Storchaka in bpo-11822.)
However, besides Python 3.7 not being formally released yet, what if you don't want or cannot use Python 3.7? Are there ways to accomplish this in earlier versions of Python such as 3.5 or 2.7 using the old dis.dis()?
You could do something like this (Python 3):
import dis
def recursive_dis(code):
print(code)
dis.dis(code)
for obj in code.co_consts:
if isinstance(obj, type(code)):
print()
recursive_dis(obj)
https://repl.it/#solly_ucko/Recursive-dis
Note that you have to call it with f.__code__ instead of just f. For example:
def multiplier(n):
def inner(multiplicand):
return multiplicand * n
return inner
recursive_dis(multiplier.__code__)
First off, if you need this for anything other than interactive use, I would recommend just copying the code from the Python 3.7 sources and backporting it (hopefully that isn't difficult).
For interactive use, an idea would be to use one of the ways to access an object by its memory value to grab the code object by its memory address, which is printed in the dis output.
For example:
>>> def func(a):
... def ifunc(b):
... return b + 10
... return ifunc
>>> import dis
>>> dis.dis(func)
2 0 LOAD_CONST 1 (<code object ifunc at 0x10cabda50, file "<stdin>", line 2>)
3 LOAD_CONST 2 ('func.<locals>.ifunc')
6 MAKE_FUNCTION 0
9 STORE_FAST 1 (ifunc)
4 12 LOAD_FAST 1 (ifunc)
15 RETURN_VALUE
Here I copy-paste the memory address of the code object printed above
>>> import ctypes
>>> c = ctypes.cast(0x10cabda50, ctypes.py_object).value
>>> dis.dis(c)
3 0 LOAD_FAST 0 (b)
3 LOAD_CONST 1 (10)
6 BINARY_ADD
7 RETURN_VALUE
WARNING: the ctypes.cast line will segfault the interpreter if you pass it something that doesn't exist in memory (say, because it's been garbage collected). Some of the other solutions from the above referenced question may work better (I tried the gc one but it didn't seem to be able to find code objects).
This also means that this won't work if you pass dis a string, because the internal code objects will already be garbage collected by the time you try to access them. You need to either pass it a real Python object, or, if you have a string, compile() it first.

Disassemble python byte-code

I'm using python 2.7 and I have a very simple script
def simple():
print("It's simple!")
x = "Come on"
Then I import this script in one project like this in order to disassemble him
import marshal
import dis
pyc_file = open('./simple.pyc', 'rb')
magic = pyc_file.read(4)
date = pyc_file.read(4)
code_object = marshal.load(pyc_file)
pyc_file.close()
dis.dis(code_object)
and get an output
1 0 LOAD_CONST 0 (<code object simple at 0x7efc5d1bfc30, file "/home/svintsov/PycharmProjects/www.artour.com/simple.py", line 1>)
3 MAKE_FUNCTION 0
6 STORE_NAME 0 (simple)
4 9 LOAD_CONST 1 ('Come on')
12 STORE_NAME 1 (x)
15 LOAD_CONST 2 (None)
18 RETURN_VALUE
But I also tried another way of disassembling in another project:
import dis
s = __import__("simple")
dis.dis(s)
which gives another output
Disassembly of simple:
2 0 LOAD_CONST 1 ("It's simple!")
3 PRINT_ITEM
4 PRINT_NEWLINE
5 LOAD_CONST 0 (None)
8 RETURN_VALUE
What's the reason these outputs are different? One of them doesn't seem to recognize string literal.
Ok I'm actually gonna answer my question myself.
As python 2.x documentation says
dis.dis([bytesource]) Disassemble the bytesource object. bytesource
can denote either a module, a class, a method, a function, or a code
object. For a module, it disassembles all functions. For a class, it
disassembles all methods. For a single code sequence, it prints one
line per bytecode instruction. If no object is provided, it
disassembles the last traceback.
So this function does different kinds of disassembly with respect to "raw" code objects and modules, despite the fact that the bytecode (.pyc file) is the same.

Categories

Resources