The title of this post pretty much sums up my question - will threads waiting on an Event be notified if that event has been garbage collected? In my particular case I have a class whose instances have an Event as an attribute, and I'm wondering whether I should implement a __del__ method on this class that calls self.event.set() before it's garbage collected.
I'm new to asynchronicity, so if event's don't set() when they're garbage collected, perhaps it's bad practice to do so, and better to let threads hang? Thanks in advance for any responses.
Since other objects hold a reference to the event, the event itself won't be deleted or garbage collected. It has no idea that your object is being deleted. Whether you want your class to have a __del__ that sets the event when the object is deleted (either naturally through having its ref count go to zero or though garbage collection) is entirely dependent on your event system design. Suppose I have a dozen objects referencing the event. Do I want the event fired when each one goes away? Depends!
Note that it's not necessarily the case that waiting for an Event implies the Event isn't in trash. Cyclic trash is one possibility, and here's another:
import threading
class C(object):
def __init__(self):
self.e = threading.Event()
def __del__(self):
print("going away")
def f():
C().e.wait()
t = threading.Thread(target=f)
t.start()
print("main ending")
That prints:
going away
main ending
and then it hangs forever, as Python attempts to .join() the thread as part of interpreter shutdown processing.
The function f(), run in a thread, creates an instance of C that becomes trash immediately after its e attribute is retrieved. So its __del__ method is called, and "going away" is displayed.
You can infer from the behavior that, no, a trash Event does not get set by magic. But it's not going to come up in practice, so don't worry about it ;-)
Related
I'm using a couple of class attributes to keep track of aggregate task completion across multiple instances of class. When reading or updating the class attributes do I need to use a lock of some sort?
class ClassAttrExample:
of_type_list = []
of_type_int = 0
def __init__(self, name):
self.name = name
def do_task(self):
# does some stuff
# do I need a lock context here???
self.of_type_list.append(self.name)
self.of_type_int += 1
If not threads are involved, no locks are required just because class instances share data. As long as the operations are performed in the same thread, everything is safe.
If threads are involved, you'll want locks.
For the specific case of CPython (the reference interpreter), as an implementation detail, the .append call does not require a lock. The GIL can only be switched out between bytecodes (or when a bytecode calls into C code that explicitly releases it, which list never does), and list.append is effectively atomic as a result (all the work it does occurs within a single CALL_METHOD bytecode which never calls back into Python level code, so the GIL is definitely held the whole time).
By contrast, += involves reading the input operand, then performing the increment, then reassigning the input, and the GIL can be swapped between those operations, leading to missed increments when two threads read the value before either writes back to it.
So if multithreaded access is possible, for the int case, the lock is required. And given you need the lock anyway, you may as well lock around the append call too, ensuring the code remains portable to GIL-free Python interpreters.
A fully portable thread-safe version of your class would look something like:
import threading
class ClassAttrExample:
_lock = threading.Lock()
of_type_list = []
of_type_int = 0
def __init__(self, name):
self.name = name
def do_task(self):
# does some stuff
with self._lock:
# Can't use bare name to refer to class attribute, must access
# through class or instance thereof
self.of_type_list.append(self.name) # Load-only access to of_type_list
# can use self directly
type(self).of_type_int += 1 # Must use type(self) to avoid creating
# instance attribute that shadows class
# attribute on store
I have an application with a ProcessPoolExecutor, to which I deliver an object instance that has a destructor implemented using the __del__ method.
The problem is, that the __del__ method deletes files from the disk, that are common to all the threads (processes). When a process in the pool finishes its job, it calls the __del__ method of the object it got and thus ruins the resources of the other threads (processes).
I tried to prepare a "safe" object, without a destructor, which I would use when submitting jobs to the pool:
my_safe_object = copy.deepcopy(my_object)
delattr(my_safe_object, '__del__')
But the delattr call fails with the following error:
AttributeError: __del__
Any idea how to get rid of the __del__ method of an existing object at runtime?
UPDATE - My solution:
Eventually I solved it using quite an elegant workaround:
class C:
def __init__(self):
self.orig_id = id(self)
# ... CODE ...
def __del__(self):
if id(self) != self.orig_id:
return
# .... CODE ....
So the field orig_id is only computed for the original object, where the constructor is really executed. The other object "clones" are created using a deep-copy, so their orig_id value will contain the id of the original object. Thus, when the clones are destroyed and call __del__, they will compare their own id with the original object id and will return, as the IDs will not match. Thus, only the original object will pass into executing __del__.
The best thing yo do there, if you have access to the object's class code, is not to rely on __del__ at all. The fact of __del__ having a permanent side-effect could be a problem by itself, but in an environment using multiprocessing it is definitively a no-go!
Here is why: first __del__ is a method that lies on the instance's class, as most "magic" methods (and that is why you can't delete it from an instance). Second: __del__ is called when references to an object reach zero. However, if you don't have any reference to an object on the "master" process, that does not mean all the child processes are over with it. This is likely the source of your problem: reference counting for objects are independent in each process. And third: you don't have that much control on when __del__ is called, even in a single process application. It is not hard to have a dangling reference to an object in a dictionary, or cache somewhere - so tying important application behavior to __del__ is normally discouraged. And all of this is only for recent Python versions (~ > 3.5), as prior to that, __del__ would be even more unreliable, and Python would not ensure it was called at all.
So, as the other answers put it, you could try snooze __del__ directly on the class, but that would have to be done on the object's class in all the sub-processes as well.
Therefore the way I recommend you to do this is to have a method to be explicitly called that will perform the file-erasing and other side-effects when disposing of an object. You simply rename your __del__ method and call it just on the main process.
If you want to ensure this "destructor" to be called,Python does offer some automatic control with the context protocol: you will then use your objects within a with statement block - and destroy it with inside an __exit__ method. This method is called automatically at the end of the with block. Of course, you will have to devise a way for the with block just to be left when work in the subprocess on the instance have finished. That is why in this case, I think an ordinary, explicit, clean-up method that would be called on your main process when consuming the "result" of whatever you executed off-process would be easier.
TL;DR
Change your source object's class clean-up code from __del__ to an ordinary method, like cleanup
On submitting your instances to off-process executing, call the clean-up in your main-process, by using the concurrent.futures.as_completed call.
In case you can't change the source code for the object's class, inherit it,
override __del__ with a no-op method, and force the object's __class__ atribute to the inherited class before submitting it to other processes:
class SafeObject(BombObject):
def __del__(self):
pass
def execute(obj):
# this function is executed in other process
...
def execute_all(obj_list):
executor = concurrent.futures.ProcessPoolExecutor(max_workers=XX)
with executor:
futures = {}
for obj in obj_list:
obj.__class__ = SafeObject
futures[executor.submit(execute, obj)] = obj
for future in concurrent.futures.as_completed(futures):
value = future.result() # add try/except aroudn this as needed.
BombClass.__del__(obj) # Or just restore the "__class__" if the isntances will be needed elsewhere
del futures # Needed to clean-up the extra references to the objects created in the futures dict.
(please note that the "with" statement above is from the recommended usage for ProcessPoolExecutor, from the docs, not for the custom __exit__ method I suggested you using earlier in the answer. Having a with block equivalent that will allow you to take full advantage of the ProcessPoolExecutor will require some ingenuity into it)
In general, methods belong to the class. While generally you can shadow a method on an instance, special "dunder" methods are optimized to check the class first regardless. So consider:
In [1]: class Foo:
...: def __int__(self):
...: return 42
...:
In [2]: foo = Foo()
In [3]: int(foo)
Out[3]: 42
In [4]: foo.__int__ = lambda self: 43
In [5]: int(foo)
Out[5]: 42
You can read more about this behavior in the docs
For custom classes, implicit invocations of special methods are only guaranteed to work correctly if defined on an object’s type, not in the object’s instance dictionary.
I think the cleanest solution if you are using multiprocessing is to simply derive from the class and override __del__. I fear that monkey-patching the class will not play nice with multiprocessing, unless you monkey patch the class in all the processes. Not sure how the pickleing will work out here.
In Python3.6, I use threading.local() to store some status for thread.
Here is a simple example to explain my question:
import threading
class Test(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.local = threading.local()
self.local.test = 123
def run(self):
print(self.local.test)
When I start this thread:
t = Test()
t.start()
Python gives me an error:
AttributeError: '_thread._local' object has no attribute 'test'
It seems the test atrribute can not access out of the __init__ function scope, because I can print the value in the __init__ function after local set attribute test=123.
Is it necessary to use threading.local object inside in a Thread subclass? I think the instance attributes of a Thread instance could keep the attributes thread safe.
Anyway, why the threading.local object not work as expected between instance function?
When you constructed your thread you were using a DIFFERENT thread. when you execute the run method on the thread you are starting a NEW thread. that thread does not yet have a thread local variable set. this is why you do not have your attribute it was set on the thread constructing the thread object and not the thread running the object.
As stated in https://docs.python.org/3.6/library/threading.html#thread-local-data:
The instance’s values will be different for separate threads.
Test.__init__ executes in the caller's thread (e.g. the thread where t = Test() executes). Yes, it's good place to create thread-local storage (TLS).
But when t.run executes, it will have completely diffferent contents -- the contents accessible only within the thread t.
TLS is good when You need to share data in scope of current thread. It like just a local variable inside a function -- but for threads. When the thread finishes execution -- TLS disappears.
For inter-thread communication Futures can be a good choice. Some others are Conditional variables, events, etc. See threading docs page.
I have a subclass of threading.Thread. After instantiating it, it runs forever in the background.
class MyThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.daemon = True
self.start()
def run(self):
while True:
<do something>
If I were to instantiate the thread from within another class, I would normally do so with
self.my_thread = MyThread()
In cases when I never thereafter have to access the thread, I have long wondered whether I can instead instantiate it simply with
MyThread()
(i.e., instantiate it without holding a reference). Will the thread eventually be garbage collected because there is no reference holding it?
it doesnt matter ... you can test it easily with del self.my_thread and you should see the thread continue running even though you deleted the only reference and forced garbage collection ... that said it is usually a good idea to hold a reference (so that you can set flags and what not for the other thread, although shared memory may be sufficient)
Say I derive from threading.Thread:
from threading import Thread
class Worker(Thread):
def start(self):
self.running = True
Thread.start(self)
def terminate(self):
self.running = False
self.join()
def run(self):
import time
while self.running:
print "running"
time.sleep(1)
Any instance of this class with the thread being started must have it's thread actively terminated before it can get garbage collected (the thread holds a reference itself). So this is a problem, because it completely defies the purpose of garbage collection. In that case having some object encapsulating a thread, and with the last instance of the object going out of scope the destructor gets called for thread termination and cleanup. Thuss a destructor
def __del__(self):
self.terminate()
will not do the trick.
The only way I see to nicely encapsulate threads is by using low level thread builtin module and weakref weak references. Or I may be missing something fundamental. So is there a nicer way than tangling things up in weakref spaghetti code?
How about using a wrapper class (which has-a Thread rather than is-a Thread)?
eg:
class WorkerWrapper:
__init__(self):
self.worker = Worker()
__del__(self):
self.worker.terminate()
And then use these wrapper classes in client code, rather than threads directly.
Or perhaps I miss something (:
To add an answer inspired by #datenwolf's comment, here is another way to do it that deals with the object being deleted or the parent thread ending:
import threading
import time
import weakref
class Foo(object):
def __init__(self):
self.main_thread = threading.current_thread()
self.initialised = threading.Event()
self.t = threading.Thread(target=Foo.threaded_func,
args=(weakref.proxy(self), ))
self.t.start()
while not self.initialised.is_set():
# This loop is necessary to stop the main threading doing anything
# until the exception handler in threaded_func can deal with the
# object being deleted.
pass
def __del__(self):
print 'self:', self, self.main_thread.is_alive()
self.t.join()
def threaded_func(self):
self.initialised.set()
try:
while True:
print time.time()
if not self.main_thread.is_alive():
print('Main thread ended')
break
time.sleep(1)
except ReferenceError:
print('Foo object deleted')
foo = Foo()
del foo
foo = Foo()
I guess you are a convert from C++ where a lot of meaning can be attached to scopes of variables, equalling lifetimes of variables. This is not the case for Python, and garbage collected languages in general.
Scope != Lifetime simply because garbage collection occurs whenever the interpreter gets around to it, not on scope boundaries. Especially as you are trying to do asynchronuous stuff with it, the raised hairs on your neck should vibrate to the clamour of all the warning bells in your head!
You can do stuff with the lifetime of objects, using 'del'.
(In fact, if you read the sources to the cpython garbage collector module, the obvious (and somewhat funny) disdain for objects with finalizers (del methods) expressed there, should tell everybody to use even the lifetime of an object only if necessary).
You could use sys.getrefcount(self) to find out when to leave the loop in your thread. But I can hardly recommend that (just try out what numbers it returns. You won't be happy. To see who holds what just check gc.get_referrers(self)).
The reference count may/will depend on garbage collection as well.
Besides, tying the runtime of a thread of execution to scopes/lifetimes of objects is an error 99% of the time. Not even Boost does it. It goes out of its RAII way to define something called a 'detached' thread.
http://www.boost.org/doc/libs/1_55_0/doc/html/thread/thread_management.html