Python threading.Thread, scopes and garbage collection - python

Say I derive from threading.Thread:
from threading import Thread
class Worker(Thread):
def start(self):
self.running = True
Thread.start(self)
def terminate(self):
self.running = False
self.join()
def run(self):
import time
while self.running:
print "running"
time.sleep(1)
Any instance of this class with the thread being started must have it's thread actively terminated before it can get garbage collected (the thread holds a reference itself). So this is a problem, because it completely defies the purpose of garbage collection. In that case having some object encapsulating a thread, and with the last instance of the object going out of scope the destructor gets called for thread termination and cleanup. Thuss a destructor
def __del__(self):
self.terminate()
will not do the trick.
The only way I see to nicely encapsulate threads is by using low level thread builtin module and weakref weak references. Or I may be missing something fundamental. So is there a nicer way than tangling things up in weakref spaghetti code?

How about using a wrapper class (which has-a Thread rather than is-a Thread)?
eg:
class WorkerWrapper:
__init__(self):
self.worker = Worker()
__del__(self):
self.worker.terminate()
And then use these wrapper classes in client code, rather than threads directly.
Or perhaps I miss something (:

To add an answer inspired by #datenwolf's comment, here is another way to do it that deals with the object being deleted or the parent thread ending:
import threading
import time
import weakref
class Foo(object):
def __init__(self):
self.main_thread = threading.current_thread()
self.initialised = threading.Event()
self.t = threading.Thread(target=Foo.threaded_func,
args=(weakref.proxy(self), ))
self.t.start()
while not self.initialised.is_set():
# This loop is necessary to stop the main threading doing anything
# until the exception handler in threaded_func can deal with the
# object being deleted.
pass
def __del__(self):
print 'self:', self, self.main_thread.is_alive()
self.t.join()
def threaded_func(self):
self.initialised.set()
try:
while True:
print time.time()
if not self.main_thread.is_alive():
print('Main thread ended')
break
time.sleep(1)
except ReferenceError:
print('Foo object deleted')
foo = Foo()
del foo
foo = Foo()

I guess you are a convert from C++ where a lot of meaning can be attached to scopes of variables, equalling lifetimes of variables. This is not the case for Python, and garbage collected languages in general.
Scope != Lifetime simply because garbage collection occurs whenever the interpreter gets around to it, not on scope boundaries. Especially as you are trying to do asynchronuous stuff with it, the raised hairs on your neck should vibrate to the clamour of all the warning bells in your head!
You can do stuff with the lifetime of objects, using 'del'.
(In fact, if you read the sources to the cpython garbage collector module, the obvious (and somewhat funny) disdain for objects with finalizers (del methods) expressed there, should tell everybody to use even the lifetime of an object only if necessary).
You could use sys.getrefcount(self) to find out when to leave the loop in your thread. But I can hardly recommend that (just try out what numbers it returns. You won't be happy. To see who holds what just check gc.get_referrers(self)).
The reference count may/will depend on garbage collection as well.
Besides, tying the runtime of a thread of execution to scopes/lifetimes of objects is an error 99% of the time. Not even Boost does it. It goes out of its RAII way to define something called a 'detached' thread.
http://www.boost.org/doc/libs/1_55_0/doc/html/thread/thread_management.html

Related

object has no attribute while trying to define thread

This is the code:
class foo:
def levelOne(self):
def worker(self, i):
print('doing hard work')
def writer(self):
print('writing workers work')
session = foo()
i=0
threads = list()
for i in range(0,5):
thread = threading.Thread(target=session.levelOne.worker, args=(i,))
thread.start()
threads.append(thread)
writerThread = threading.Thread(target=session.levelOne.writer)
writerThread.start()
for thread in threads:
thread.join()
writerThread.join()
5 workers should do the job and the writer should collect their results.
The error I get is: session object has no attribute worker
workers are actually testers that do a certain work in different "areas" while writer is keeping track of them without making my workers return any result.
It's important for this algorithm to be divided on layers like "levelOne", "levelTwo" etc. because they will all work together. This is the main reason why I keep the threading outside the class instead of the levelOne method.
please help me understand where I'm wrong
You certainly dont have "session object has no attribute worker" as error message with the code you posted - the error should be "'function' object has no attribute 'worker'". And actually I don't know why you'd expect anything else - names defined within a function are local variables (hint: python functions are objects just like any other), they do not become attributes of the function.
It's important for this algorithm to be divided on layers like "levelOne", "levelTwo"
Well, possibly but that's not the proper design. If you want foo to be nothing but a namespace and levelOne, levelTwo etc to be instances of some type having bot a writer and worker methods, then you need to 1/ define your LevelXXX as classes, 2/ build instances of those objects as attributes of your foo class, ie:
class LevelOne():
def worker(self, i):
# ...
def writer(self):
# ...
class foo():
levelOne = LevelOne()
Now whether this is the correct design for your use case is not garanteed in any way, but it's impossible to design a proper solution without knowing anything about the problem...
If it's possible could you explain why trying to access workers and writer as shown in question's code is bad design?
Well, for the mere reason that it doesn't work, to start with, obviously xD.
Note that you could return the "worker" and "writer" functions from the levelOne method, ie:
class foo:
def levelOne(self):
def worker(self, i):
print('doing hard work')
def writer(self):
print('writing workers work')
return worker, writer
session = foo()
worker, writer = session.levelOne()
# etc
but this is both convoluted (assuming the point is to let worker and writer share self, which is much more simply done using a proper LevelOne class and making worker and writer methods of this class) and inefficient (def is an executable statement, so with your solution the worker and writer functions are created anew - which is not free - on each call).

Python threading.local() not working in Thread class

In Python3.6, I use threading.local() to store some status for thread.
Here is a simple example to explain my question:
import threading
class Test(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.local = threading.local()
self.local.test = 123
def run(self):
print(self.local.test)
When I start this thread:
t = Test()
t.start()
Python gives me an error:
AttributeError: '_thread._local' object has no attribute 'test'
It seems the test atrribute can not access out of the __init__ function scope, because I can print the value in the __init__ function after local set attribute test=123.
Is it necessary to use threading.local object inside in a Thread subclass? I think the instance attributes of a Thread instance could keep the attributes thread safe.
Anyway, why the threading.local object not work as expected between instance function?
When you constructed your thread you were using a DIFFERENT thread. when you execute the run method on the thread you are starting a NEW thread. that thread does not yet have a thread local variable set. this is why you do not have your attribute it was set on the thread constructing the thread object and not the thread running the object.
As stated in https://docs.python.org/3.6/library/threading.html#thread-local-data:
The instance’s values will be different for separate threads.
Test.__init__ executes in the caller's thread (e.g. the thread where t = Test() executes). Yes, it's good place to create thread-local storage (TLS).
But when t.run executes, it will have completely diffferent contents -- the contents accessible only within the thread t.
TLS is good when You need to share data in scope of current thread. It like just a local variable inside a function -- but for threads. When the thread finishes execution -- TLS disappears.
For inter-thread communication Futures can be a good choice. Some others are Conditional variables, events, etc. See threading docs page.

Python - is `threading.Event` "set" during garbage collection?

The title of this post pretty much sums up my question - will threads waiting on an Event be notified if that event has been garbage collected? In my particular case I have a class whose instances have an Event as an attribute, and I'm wondering whether I should implement a __del__ method on this class that calls self.event.set() before it's garbage collected.
I'm new to asynchronicity, so if event's don't set() when they're garbage collected, perhaps it's bad practice to do so, and better to let threads hang? Thanks in advance for any responses.
Since other objects hold a reference to the event, the event itself won't be deleted or garbage collected. It has no idea that your object is being deleted. Whether you want your class to have a __del__ that sets the event when the object is deleted (either naturally through having its ref count go to zero or though garbage collection) is entirely dependent on your event system design. Suppose I have a dozen objects referencing the event. Do I want the event fired when each one goes away? Depends!
Note that it's not necessarily the case that waiting for an Event implies the Event isn't in trash. Cyclic trash is one possibility, and here's another:
import threading
class C(object):
def __init__(self):
self.e = threading.Event()
def __del__(self):
print("going away")
def f():
C().e.wait()
t = threading.Thread(target=f)
t.start()
print("main ending")
That prints:
going away
main ending
and then it hangs forever, as Python attempts to .join() the thread as part of interpreter shutdown processing.
The function f(), run in a thread, creates an instance of C that becomes trash immediately after its e attribute is retrieved. So its __del__ method is called, and "going away" is displayed.
You can infer from the behavior that, no, a trash Event does not get set by magic. But it's not going to come up in practice, so don't worry about it ;-)

Python: Holding a reference to a subclass of threading

I have a subclass of threading.Thread. After instantiating it, it runs forever in the background.
class MyThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.daemon = True
self.start()
def run(self):
while True:
<do something>
If I were to instantiate the thread from within another class, I would normally do so with
self.my_thread = MyThread()
In cases when I never thereafter have to access the thread, I have long wondered whether I can instead instantiate it simply with
MyThread()
(i.e., instantiate it without holding a reference). Will the thread eventually be garbage collected because there is no reference holding it?
it doesnt matter ... you can test it easily with del self.my_thread and you should see the thread continue running even though you deleted the only reference and forced garbage collection ... that said it is usually a good idea to hold a reference (so that you can set flags and what not for the other thread, although shared memory may be sufficient)

Python class instance starts method in new thread

I spent the last hour(s???) looking/googling for a way to have a class start one of its methods in a new thread as soon as it is instanciated.
I could run something like this:
x = myClass()
def updater():
while True:
x.update()
sleep(0.01)
update_thread = Thread(target=updater)
update_thread.daemon = True
update_thread.start()
A more elegant way would be to have the class doing it in init when it is instanciated.
Imagine having 10 instances of that class...
Until now I couldn't find a (working) solution for this problem...
The actual class is a timer and the method is an update method that updates all the counter's variables. As this class also has to run functions at a given time it is important that the time updates won't be blocked by the main thread.
Any help is much appreciated. Thx in advance...
You can subclass directly from Thread in this specific case
from threading import Thread
class MyClass(Thread):
def __init__(self, other, arguments, here):
super(MyClass, self).__init__()
self.daemon = True
self.cancelled = False
# do other initialization here
def run(self):
"""Overloaded Thread.run, runs the update
method once per every 10 milliseconds."""
while not self.cancelled:
self.update()
sleep(0.01)
def cancel(self):
"""End this timer thread"""
self.cancelled = True
def update(self):
"""Update the counters"""
pass
my_class_instance = MyClass()
# explicit start is better than implicit start in constructor
my_class_instance.start()
# you can kill the thread with
my_class_instance.cancel()
In order to run a function (or memberfunction) in a thread, use this:
th = Thread(target=some_func)
th.daemon = True
th.start()
Comparing this with deriving from Thread, it has the advantage that you don't export all of Thread's public functions as own public functions. Actually, you don't even need to write a class to use this code, self.function or global_function are both equally usable as target here.
I'd also consider using a context manager to start/stop the thread, otherwise the thread might stay alive longer than necessary, leading to resource leaks and errors on shutdown. Since you're putting this into a class, start the thread in __enter__ and join with it in __exit__.

Categories

Resources