Python thread termination requirements / is join(ing) required and neccessary? [duplicate] - python

I understand the purpose of joining a thread, I'm asking about resource use. My specific use-case here is that I have a long-running process that needs to spawn many threads, and during operation, checks if they have terminated and then cleans them up. The main thread waits on inotify events and spawns threads based on those, so it can't block on join() calls, because it needs to block on inotify calls.
I know that with pthreads, for instance, not joining a terminated thread will cause a resource leak:
PTHREAD_JOIN(3): Failure to join with a thread that is joinable (i.e., one that is not detached), produces a "zombie thread". Avoid doing this, since each zombie thread consumes some system resources, and when enough zombie threads have accumulated, it will no longer be possible to create new threads (or processes).
Python's documentation says no such thing, though, but it also doesn't specify that the join() can be disregarded without issue if many threads are expected to end on their own without being joined during normal operation.
I'm wondering, can I simply take my thread list and do the following:
threads = [thread for thread in threads if thread.is_alive()]
For each check, or will this leak? Or must I do the following?
alive_threads = list()
for thread in threads:
if thread.is_alive():
alive_threads.append(thread)
else:
thread.join()
threads = alive_threads

TLDR: No. Thread cleans up the underlying resources by itself.
Thread.join merely waits for the thread to end, it does not perform cleanup. Basically, each Thread has a lock that is released when the thread is done and subsequently cleaned up. Thread.join just waits for the lock to be released.
There is some minor cleanup done by Thread.join, namely removing the lock and setting a flag to mark the thread as dead. This is an optimisation to avoid needlessly waiting for the lock. These are internal, however, and also performed by all other public methods relying on the lock and flag. Finally, this cleanup is functionally equivalent to a Thread being cleaned up automatically due to garbage collection.

Related

Clean up a thread without .join() and without blocking the main thread

I am in a situation where I have two endpoints I can ask for a value, and one may be faster than the other. The calls to the endpoints are blocking. I want to wait for one to complete and take that result without waiting for the other to complete.
My solution was to issue the requests in separate threads and have those threads set a flag to true when they complete. In the main thread, I continuously check the flags (I know it is a busy wait, but that is not my primary concern right now) and when one completes it takes that value and returns it as the result.
The issue I have is that I never clean up the other thread. I can't find any way to do it without using .join(), which would just block and defeat the purpose of this whole thing. So, how can I clean up that other, slower thread that is blocking without joining it from the main thread?
What you want is to make your threads daemons, so when you get the result and finish your main, the other running thread will be forced to finish. You do that by changing the daemon keyword to True:
tr = threading.Thread(daemon=True)
From the threading docs:
The significance of this flag is that the entire Python program exits
when only daemon threads are left.
Although:
Daemon threads are abruptly stopped at shutdown. Their resources (such
as open files, database transactions, etc.) may not be released
properly. If you want your threads to stop gracefully, make them
non-daemonic and use a suitable signalling mechanism such as an Event.
I don't have any particular experience with Events so can't elaborate on that. Feel free to click the link and read on.
One bad and dirty solution is to implement a methode for the threads which close the socket which is blocking. Now you have to catch the exception in the main thread.

Understanding preemptive multitasking with locks and the Python GIL?

I'm reading through Grok The GIL and it has the following statement in the discussion about locking.
So long as no thread holds a lock while it sleeps, does I/O, or some other GIL-dropping operation, you should use the coarsest, simplest locks possible. Other threads couldn't have run in parallel anyway.
It comes just after a discussion about preemptive multitasking. What prevents the preemptive dropping of the GIL from happening while you have a lock? Or is that not what this statement is referring to?
I asked the author of the piece and it comes down to the difference between dropping the GIL because you are waiting on an external operation vs an internal preemtion: https://opensource.com/article/17/4/grok-gil#comment-136186
Hi! Nothing prevents a thread from preemptively dropping the GIL while
it holds a lock. Let's call that Thread A, and let's say there's also
a Thread B. If Thread A holds a lock and gets preempted, then maybe
Thread B could run instead of Thread A.
If Thread B is waiting for the lock that Thread A is holding, then Thread B is not waiting for the GIL. In that case Thread A reacquires the GIL immediately after dropping it, and Thread A continues.
If Thread B is not waiting for
the lock that Thread A is holding, then Thread B might acquire the GIL
and run.
My point about coarse locks, however, is this: no two threads
can ever execute Python in parallel, because of the GIL. So using
fine-grained locks doesn't improve throughput. This is in contrast to
a language like Java or C, where fine-grained locks allow greater
parallelism, and therefore greater throughput.
I still needed some clarification, and he did confirm this:
If I'm understanding you correctly, the intent of the statement I referenced was to avoid using locks around external operations, where you could then block multiple threads, if they all depended on that lock.
For the preemptive example, Thread A isn't blocked by anything externally, so the processing just goes back and forth similar to cooperative multitasking.

Do threads in python need to be joined to avoid leakage?

I understand the purpose of joining a thread, I'm asking about resource use. My specific use-case here is that I have a long-running process that needs to spawn many threads, and during operation, checks if they have terminated and then cleans them up. The main thread waits on inotify events and spawns threads based on those, so it can't block on join() calls, because it needs to block on inotify calls.
I know that with pthreads, for instance, not joining a terminated thread will cause a resource leak:
PTHREAD_JOIN(3): Failure to join with a thread that is joinable (i.e., one that is not detached), produces a "zombie thread". Avoid doing this, since each zombie thread consumes some system resources, and when enough zombie threads have accumulated, it will no longer be possible to create new threads (or processes).
Python's documentation says no such thing, though, but it also doesn't specify that the join() can be disregarded without issue if many threads are expected to end on their own without being joined during normal operation.
I'm wondering, can I simply take my thread list and do the following:
threads = [thread for thread in threads if thread.is_alive()]
For each check, or will this leak? Or must I do the following?
alive_threads = list()
for thread in threads:
if thread.is_alive():
alive_threads.append(thread)
else:
thread.join()
threads = alive_threads
TLDR: No. Thread cleans up the underlying resources by itself.
Thread.join merely waits for the thread to end, it does not perform cleanup. Basically, each Thread has a lock that is released when the thread is done and subsequently cleaned up. Thread.join just waits for the lock to be released.
There is some minor cleanup done by Thread.join, namely removing the lock and setting a flag to mark the thread as dead. This is an optimisation to avoid needlessly waiting for the lock. These are internal, however, and also performed by all other public methods relying on the lock and flag. Finally, this cleanup is functionally equivalent to a Thread being cleaned up automatically due to garbage collection.

Is the Python non-daemon thread a non-detached thread? When is its resource freed?

Will the resources of non-daemon thread get released back to OS once the threads completes? ie If the main thread is not calling join() on these non-daemon threads, will the python GC call join on them and release the resources once held by the thread?
If you spawn a thread that runs a function, and then that function completes before the end of the program, then yes, the thread will get garbage collected once it is (a) no longer running, and (b) no longer referenced by anything else.
"
A thread can be flagged as a “daemon thread”. The significance of this
flag is that the entire Python program exits when only daemon threads
are left. The initial value is inherited from the creating thread. The
flag can be set through the daemon property.
Note Daemon threads are abruptly stopped at shutdown. Their resources
(such as open files, database transactions, etc.) may not be released
properly. If you want your threads to stop gracefully, make them
non-daemonic and use a suitable signalling mechanism such as an Event.
" -- Python Thread Docs
Daemons are cleaned up by Python, Non-daemonic threads are not - you have to signal them to stop. This is useful in executing some complicated parallel code though :)
This does mean though you can have random dummy / useless threads sitting around for you to manually cleanup if you use non-daemonic threads.
tl;dr Non-daemonic threads never 'finish' you have to signal them to finish via your own mechanism or one of the SIGS e.g. SIGTERM.

How can I kill a thread in python [duplicate]

This question already has answers here:
Is there any way to kill a Thread?
(31 answers)
Closed 6 years ago.
I start a thread using the following code.
t = thread.start_new_thread(myfunction)
How can I kill the thread t from another thread. So basically speaking in terms of code, I want to be able to do something like this.
t.kill()
Note that I'm using Python 2.4.
In Python, you simply cannot kill a Thread.
If you do NOT really need to have a Thread (!), what you can do, instead of using the threading package (http://docs.python.org/2/library/threading.html), is to use the multiprocessing package (http://docs.python.org/2/library/multiprocessing.html). Here, to kill a process, you can simply call the method:
yourProcess.terminate() # kill the process!
Python will kill your process (on Unix through the SIGTERM signal, while on Windows through the TerminateProcess() call). Pay attention to use it while using a Queue or a Pipe! (it may corrupt the data in the Queue/Pipe)
Note that the multiprocessing.Event and the multiprocessing.Semaphore work exactly in the same way of the threading.Event and the threading.Semaphore respectively. In fact, the first ones are clones of the latters.
If you REALLY need to use a Thread, there is no way to kill your threads directly. What you can do, however, is to use a "daemon thread". In fact, in Python, a Thread can be flagged as daemon:
yourThread.daemon = True # set the Thread as a "daemon thread"
The main program will exit when no alive non-daemon threads are left. In other words, when your main thread (which is, of course, a non-daemon thread) will finish its operations, the program will exit even if there are still some daemon threads working.
Note that it is necessary to set a Thread as daemon before the start() method is called!
Of course you can, and should, use daemon even with multiprocessing. Here, when the main process exits, it attempts to terminate all of its daemonic child processes.
Finally, please, note that sys.exit() and os.kill() are not choices.
If your thread is busy executing Python code, you have a bigger problem than the inability to kill it. The GIL will prevent any other thread from even running whatever instructions you would use to do the killing. (After a bit of research, I've learned that the interpreter periodically releases the GIL, so the preceding statement is bogus. The remaining comment stands, however.)
Your thread must be written in a cooperative manner. That is, it must periodically check in with a signalling object such as a semaphore, which the main thread can use to instruct the worker thread to voluntarily exit.
while not sema.acquire(False):
# Do a small portion of work…
or:
for item in work:
# Keep working…
# Somewhere deep in the bowels…
if sema.acquire(False):
thread.exit()
You can't kill a thread from another thread. You need to signal to the other thread that it should end. And by "signal" I don't mean use the signal function, I mean that you have to arrange for some communication between the threads.

Categories

Resources