Self-deleting class in Python - python

EDIT: Disclaimer - I don't mean deletion in the sense that applies to languages that aren't memory-managed (e.g. free in C++). Deletion here is to be understood as the fact that the superclass doesn't have the subclass as one of its subclasses anymore after its been deleted.
In Python, you can delete a class (yes I do mean a class, not an instance) by doing the following:
class Super:
...
class DeleteMe(Super):
...
print(Super.__subclasses__())
# [<class '__main__.DeleteMe'>]
del DeleteMe
import gc
gc.collect() # Force a collection
print(Super.__subclasses__())
# []
I am trying to emulate this behaviour but I want the DeleteMe class to be able to destroy itself. Here is what I've tried:
class Super:
...
class DeleteMe(Super):
def self_delete(self):
print(self.__class__)
# <class '__main__.DeleteMe'>, this looks right
del self.__class__ # this fails
import gc
gc.collect()
print(Super.__subclasses__())
# [<class '__main__.DeleteMe'>]
DeleteMe().self_delete()
It fails with the following traceback:
Traceback (most recent call last):
File "/Users/rayan/Desktop/test.py", line 10, in <module>
DeleteMe().self_delete()
File "/Users/rayan/Desktop/test.py", line 4, in self_delete
del self.__class__
TypeError: can't delete __class__ attribute
How can I achieve this self-destructing behaviour?
Note: not a duplicate of How to remove classes from __subclasses__?, that question covers the first case where the deletion happens outside of the class

del DestructMe
This is not deleting the class. This is deleting the name that happens to refer to the class. If there are no other references to the class (and that includes the name you just deleted, any module that's ever imported the class, any instances of the class, and any other places where the class might happen to be stored), then the garbage collector might delete the class when you gc.collect().
Now an instance always knows its own class, via the __class__ attribute. It makes little sense to delete self.__class__, because then what would we be left with? An instance with no class? What can we do with it? We can't call methods on it since those are defined on the class, and we can't do anything object-like on it since it's no longer an instance of object (a superclass of the class we just removed). So really we have a sort of silly looking dictionary that doesn't even do all of the dict things in Python. Hence, disallowed.
You cannot delete data in Python. That's the garbage collector's job. There is no Python equivalent of C's free or C++'s delete. del in Python deletes bindings or dictionary entries. It does not remove data; it removes pointers that happen to point to data.

Related

Deleting the '__del__' method from an existing object in Python

I have an application with a ProcessPoolExecutor, to which I deliver an object instance that has a destructor implemented using the __del__ method.
The problem is, that the __del__ method deletes files from the disk, that are common to all the threads (processes). When a process in the pool finishes its job, it calls the __del__ method of the object it got and thus ruins the resources of the other threads (processes).
I tried to prepare a "safe" object, without a destructor, which I would use when submitting jobs to the pool:
my_safe_object = copy.deepcopy(my_object)
delattr(my_safe_object, '__del__')
But the delattr call fails with the following error:
AttributeError: __del__
Any idea how to get rid of the __del__ method of an existing object at runtime?
UPDATE - My solution:
Eventually I solved it using quite an elegant workaround:
class C:
def __init__(self):
self.orig_id = id(self)
# ... CODE ...
def __del__(self):
if id(self) != self.orig_id:
return
# .... CODE ....
So the field orig_id is only computed for the original object, where the constructor is really executed. The other object "clones" are created using a deep-copy, so their orig_id value will contain the id of the original object. Thus, when the clones are destroyed and call __del__, they will compare their own id with the original object id and will return, as the IDs will not match. Thus, only the original object will pass into executing __del__.
The best thing yo do there, if you have access to the object's class code, is not to rely on __del__ at all. The fact of __del__ having a permanent side-effect could be a problem by itself, but in an environment using multiprocessing it is definitively a no-go!
Here is why: first __del__ is a method that lies on the instance's class, as most "magic" methods (and that is why you can't delete it from an instance). Second: __del__ is called when references to an object reach zero. However, if you don't have any reference to an object on the "master" process, that does not mean all the child processes are over with it. This is likely the source of your problem: reference counting for objects are independent in each process. And third: you don't have that much control on when __del__ is called, even in a single process application. It is not hard to have a dangling reference to an object in a dictionary, or cache somewhere - so tying important application behavior to __del__ is normally discouraged. And all of this is only for recent Python versions (~ > 3.5), as prior to that, __del__ would be even more unreliable, and Python would not ensure it was called at all.
So, as the other answers put it, you could try snooze __del__ directly on the class, but that would have to be done on the object's class in all the sub-processes as well.
Therefore the way I recommend you to do this is to have a method to be explicitly called that will perform the file-erasing and other side-effects when disposing of an object. You simply rename your __del__ method and call it just on the main process.
If you want to ensure this "destructor" to be called,Python does offer some automatic control with the context protocol: you will then use your objects within a with statement block - and destroy it with inside an __exit__ method. This method is called automatically at the end of the with block. Of course, you will have to devise a way for the with block just to be left when work in the subprocess on the instance have finished. That is why in this case, I think an ordinary, explicit, clean-up method that would be called on your main process when consuming the "result" of whatever you executed off-process would be easier.
TL;DR
Change your source object's class clean-up code from __del__ to an ordinary method, like cleanup
On submitting your instances to off-process executing, call the clean-up in your main-process, by using the concurrent.futures.as_completed call.
In case you can't change the source code for the object's class, inherit it,
override __del__ with a no-op method, and force the object's __class__ atribute to the inherited class before submitting it to other processes:
class SafeObject(BombObject):
def __del__(self):
pass
def execute(obj):
# this function is executed in other process
...
def execute_all(obj_list):
executor = concurrent.futures.ProcessPoolExecutor(max_workers=XX)
with executor:
futures = {}
for obj in obj_list:
obj.__class__ = SafeObject
futures[executor.submit(execute, obj)] = obj
for future in concurrent.futures.as_completed(futures):
value = future.result() # add try/except aroudn this as needed.
BombClass.__del__(obj) # Or just restore the "__class__" if the isntances will be needed elsewhere
del futures # Needed to clean-up the extra references to the objects created in the futures dict.
(please note that the "with" statement above is from the recommended usage for ProcessPoolExecutor, from the docs, not for the custom __exit__ method I suggested you using earlier in the answer. Having a with block equivalent that will allow you to take full advantage of the ProcessPoolExecutor will require some ingenuity into it)
In general, methods belong to the class. While generally you can shadow a method on an instance, special "dunder" methods are optimized to check the class first regardless. So consider:
In [1]: class Foo:
...: def __int__(self):
...: return 42
...:
In [2]: foo = Foo()
In [3]: int(foo)
Out[3]: 42
In [4]: foo.__int__ = lambda self: 43
In [5]: int(foo)
Out[5]: 42
You can read more about this behavior in the docs
For custom classes, implicit invocations of special methods are only guaranteed to work correctly if defined on an object’s type, not in the object’s instance dictionary.
I think the cleanest solution if you are using multiprocessing is to simply derive from the class and override __del__. I fear that monkey-patching the class will not play nice with multiprocessing, unless you monkey patch the class in all the processes. Not sure how the pickleing will work out here.

Calling instance method using class definition in Python

Lately, I've been studying Python's class instantiation process to really understand what happen under the hood when creating a class instance. But, while playing around with test code, I came across something I don't understand.
Consider this dummy class
class Foo():
def test(self):
print("I'm using test()")
Normally, if I wanted to use Foo.test instance method, I would go and create an instance of Foo and call it explicitly like so,
foo_inst = Foo()
foo_inst.test()
>>>> I'm using test()
But, I found that calling it that way ends up with the same result,
Foo.test(Foo)
>>>> I'm using test()
Here I don't actually create an instance, but I'm still accessing Foo's instance method. Why and how is this working in the context of Python ? I mean self normally refers to the current instance of the class, but I'm not technically creating a class instance in this case.
print(Foo()) #This is a Foo object
>>>><__main__.Foo object at ...>
print(Foo) #This is not
>>>> <class '__main__.Foo'>
Props to everyone that led me there in the comments section.
The answer to this question rely on two fundamentals of Python:
Duck-typing
Everything is an object
Indeed, even if self is Python's idiom to reference the current class instance, you technically can pass whatever object you want because of how Python handle typing.
Now, the other confusion that brought me here is that I wasn't creating an object in my second example. But, the thing is, Foo is already an object internally.
This can be tested empirically like so,
print(type(Foo))
<class 'type'>
So, we now know that Foo is an instance of class type and therefore can be passed as self even though it is not an instance of itself.
Basically, if I were to manipulate self as if it was a Foo object in my test method, I would have problem when calling it like my second example.
A few notes on your question (and answer). First, everything is, really an object. Even a class is an object, so, there is the class of the class (called metaclass) which is type in this case.
Second, more relevant to your case. Methods are, more or less, class, not instance attributes. In python, when you have an object obj, instance of Class, and you access obj.x, python first looks into obj, and then into Class. That's what happens when you access a method from an instance, they are just special class attributes, so they can be access from both instance and class. And, since you are not using any instance attributes of the self that should be passed to test(self) function, the object that is passed is irrelevant.
To understand that in depth, you should read about, descriptor protocol, if you are not familiar with it. It explains a lot about how things work in python. It allows python classes and objects to be essentially dictionaries, with some special attributes (very similar to javascript objects and methods)
Regarding the class instantiation, see about __new__ and metaclasses.

Garbage collect a class with a reference to its instance?

Consider this code snippet:
import gc
from weakref import ref
def leak_class(create_ref):
class Foo(object):
# make cycle non-garbage collectable
def __del__(self):
pass
if create_ref:
# create a strong reference cycle
Foo.bar = Foo()
return ref(Foo)
# without reference cycle
r = leak_class(False)
gc.collect()
print r() # prints None
# with reference cycle
r = leak_class(True)
gc.collect()
print r() # prints <class '__main__.Foo'>
It creates a reference cycle that cannot be collected, because the referenced instance has a __del__ method. The cycle is created here:
# create a strong reference cycle
Foo.bar = Foo()
This is just a proof of concept, the reference could be added by some external code, a descriptor or anything. If that's not clear to you, remember that each objects mantains a reference to its class:
+-------------+ +--------------------+
| | Foo.bar | |
| Foo (class) +------------>| foo (Foo instance) |
| | | |
+-------------+ +----------+---------+
^ |
| foo.__class__ |
+--------------------------------+
If I could guarantee that Foo.bar is only accessed from Foo, the cycle wouldn't be necessary, as theoretically the instance could hold only a weak reference to its class.
Can you think of a practical way to make this work without a leak?
As some are asking why would external code modify a class but can't control its lifecycle, consider this example, similar to the real-life example I was working to:
class Descriptor(object):
def __get__(self, obj, kls=None):
if obj is None:
try:
obj = kls._my_instance
except AttributeError:
obj = kls()
kls._my_instance = obj
return obj.something()
# usage example #
class Example(object):
foo = Descriptor()
def something(self):
return 100
print Example.foo
In this code only Descriptor (a non-data descriptor) is part of the API I'm implementing. Example class is an example of how the descriptor would be used.
Why does the descriptor store a reference to an instance inside the class itself? Basically for caching purposes. Descriptor required this contract with the implementor: it would be used in any class assuming that
The class has a constructor with no args, that gives an "anonymous instance" (my definition)
The class has some behavior-specific methods (something here).
An instance of the class can stay alive for an undefined amount of time.
It doesn't assume anything about:
How long it takes to construct an object
Whether the class implements del or other magic methods
How long the class is expected to live
Moreover the API was designed to avoid any extra load on the class implementor. I could have moved the responsibility for caching the object to the implementor, but I wanted a standard behavior.
There actually is a simple solution to this problem: make the default behavior to cache the instance (like it does in this code) but allow the implementor to override it if they have to implement __del__.
Of course this wouldn't be as simple if we assumed that the class state had to be preserved between calls.
As a starting point, I was coding a "weak object", an implementation of object that only kept a weak reference to its class:
from weakref import proxy
def make_proxy(strong_kls):
kls = proxy(strong_kls)
class WeakObject(object):
def __getattribute__(self, name):
try:
attr = kls.__dict__[name]
except KeyError:
raise AttributeError(name)
try:
return attr.__get__(self, kls)
except AttributeError:
return attr
def __setattr__(self, name, value):
# TODO: implement...
pass
return WeakObject
Foo.bar = make_proxy(Foo)()
It appears to work for a limited number of use cases, but I'd have to reimplement the whole set of object methods, and I don't know how to deal with classes that override __new__.
For your example, why don't you store _my_instance in a dict on the descriptor class, rather than on the class holding the descriptor? You could use a weakref or WeakValueDictionary in that dict, so that when the object disappears the dict will just lose its reference and the descriptor will create a new one on the next access.
Edit: I think you have a misunderstanding about the possibility of collecting the class while the instance lives on. Methods in Python are stored on the class, not the instance (barring peculiar tricks). If you have an object obj of class Class, and you allowed Class to be garbage collected while obj still exists, then calling a method obj.meth() on the object would fail, because the method would have disappeared along with the class. That is why your only option is to weaken your class->obj reference; even if you could make objects weakly reference their class, all it would do is break the class if the weakness ever "took effect" (i.e., if the class were collected while an instance still existed).
The problem you're facing is just a special case of the general ref-cycle-with-__del__ problem.
I don't see anything unusual in the way the cycles are created in your case, which is to say, you should resort to the standard ways of avoiding the general problem.
I think implementing and using a weak object would be hard to get right, and you would still need to remember to use it in all places where you define __del__. It doesn't sound like the best approach.
Instead, you should try the following:
consider not defining __del__ in your class (recommended)
in classes which define __del__, avoid reference cycles (in general, it might be hard/impossible to make sure no cycles are created anywhere in your code. In your case, seems like you want the cycles to exist)
explicitly break the cycles, using del (if there are appropriate points to do that in your code)
scan the gc.garbage list, and explicitly break reference cycles (using del)

Why is __getattribute__ not invoked on an implicit __getitem__-invocation?

While trying to wrap arbitrary objects, I came across a problem with dictionaries and lists. Investigating, I managed to come up with a simple piece of code whose behaviour I simply do not understand. I hope some of you can tell me what is going on:
>>> class Cl(object): # simple class that prints (and suppresses) each attribute lookup
... def __getattribute__(self, name):
... print 'Access:', name
...
>>> i = Cl() # instance of class
>>> i.test # test that __getattribute__ override works
Access: test
>>> i.__getitem__ # test that it works for special functions, too
Access: __getitem__
>>> i['foo'] # but why doesn't this work?
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'Cl' object has no attribute '__getitem__'
Magic __methods__() are treated specially: They are internally assigned to "slots" in the type data structure to speed up their look-up, and they are only looked up in these slots. If the slot is empty, you get the error message you got.
See Special method lookup for new-style classes in the documentation for further details. Excerpt:
In addition to bypassing any instance attributes in the interest of correctness, implicit special method lookup generally also bypasses the __getattribute__() method even of the object’s metaclass.
[…]
Bypassing the __getattribute__() machinery in this fashion provides significant scope for speed optimisations within the interpreter, at the cost of some flexibility in the handling of special methods (the special method must be set on the class object itself in order to be consistently invoked by the interpreter).

Python, __slots__, inheritance, and class variables ==> attribute is read-only bug

I have a big tree with hundreds of thousands of nodes, and I'm using __slots__ to reduce the memory consumption. I just found a very strange bug and fixed it, but I don't understand the behavior that I saw.
Here's a simplified code sample:
class NodeBase(object):
__slots__ = ["name"]
def __init__(self, name):
self.name = name
class NodeTypeA(NodeBase):
name = "Brian"
__slots__ = ["foo"]
I then execute the following:
>>> node = NodeTypeA("Monty")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in __init__
AttributeError: 'NodeTypeA' object attribute 'name' is read-only
There is no error if NodeTypeA.name is not defined (side note: that attribute was there by mistake, and had no reason for being there). There is also no error if NodeTypeA.__slots__ is never defined, and it therefore has a __dict__.
The thing I don't understand is: why does the existence of a class variable in a superclass interfere with setting an instance variable in a slot in the child class?
Can anybody explain why this combination results in the object attribute is read-only error? I know my example is contrived, and is unlikely to be intentional in a real program, but that doesn't make this behavior any less strange.
Thanks,
Jonathan
A smaller example:
class C(object):
__slots__ = ('x',)
x = 0
C().x = 1
The documentation on slots states at one point:
__slots__ are implemented at the class level by creating descriptors (Implementing Descriptors) for each variable name. As a result, class attributes cannot be used to set default values for instance variables defined by __slots__; otherwise, the class attribute would overwrite the descriptor assignment.
When __slots__ is in use, attribute assignment to slot attributes needs to go through the descriptors created for the slot attributes. Shadowing the descriptors in a subclass causes Python to be unable to find the routine needed to set the attribute. Python can still see that an attribute is there, though (because it finds the object that's shadowing the descriptor), so it reports that the attribute is read-only.

Categories

Resources