I am new to python multi threading and trying to understand the basic difference between joining multiple worker threads and calling abort on them after I am done processing with them. Can somebody please explain me with an example?
.join() and setting a abort flags are two different steps in cleanly shutting down a thread.
join() just waits for a thread that is going to terminate anyway to be finished. Thus:
import threading
import time
def thread_main():
time.sleep(10)
t = threading.Thread(target=thread_main)
t.start()
t.join()
This is a reasonable program. The join just waits until the thread is finished. It doesn't do anything to make that happen, but the thread will terminate anyway, because it is just a 10 second sleep.
In contrast
import threading
import time
def thread_main():
while True:
time.sleep(10)
t = threading.Thread(target=thread_main)
t.start()
t.join()
Is not a good idea, because join will still wait for the thread to terminate on it's own. But the thread will never do that because it loops forever. Thus the whole program can't terminate.
That's the point where you want some kind of signaling to the thread for it so stop itself
import threading
import time
stop_thread = False
def thread_main():
while not stop_thread:
time.sleep(10)
t = threading.Thread(target=thread_main)
t.start()
stop_thread = True
t.join()
Here stop_thread takes the role of your __abort flag and signals the thread to stop after it has finished with it's latest work (the sleep(10) in this case)
Thus this program again is reasonable and terminates when asked to do.
Another popular way to signal a thread to stop when the thread uses a consumer pattern (i.e. gets its work from a queue) is to post a special 'terminate now' work item as alternative to setting a flag variable:
def thread_main():
while True:
(quit, data) = work_queue().get()
if quit: break
do_work(data)
Related
The thing I cannot figure out is that although ThreadPoolExecutor uses daemon workers, they will still run even if main thread exit.
I can provide a minimal example in python3.6.4:
import concurrent.futures
import time
def fn():
while True:
time.sleep(5)
print("Hello")
thread_pool = concurrent.futures.ThreadPoolExecutor()
thread_pool.submit(fn)
while True:
time.sleep(1)
print("Wow")
Both main thread and the worker thread are infinite loops. So if I use KeyboardInterrupt to terminate main thread, I expect that the whole program will terminate too. But actually the worker thread is still running even though it is a daemon thread.
The source code of ThreadPoolExecutor confirms that worker threads are daemon thread:
t = threading.Thread(target=_worker,
args=(weakref.ref(self, weakref_cb),
self._work_queue))
t.daemon = True
t.start()
self._threads.add(t)
Further, if I manually create a daemon thread, it works like a charm:
from threading import Thread
import time
def fn():
while True:
time.sleep(5)
print("Hello")
thread = Thread(target=fn)
thread.daemon = True
thread.start()
while True:
time.sleep(1)
print("Wow")
So I really cannot figure out this strange behavior.
Suddenly... I found why. According to much more source code of ThreadPoolExecutor:
# Workers are created as daemon threads. This is done to allow the interpreter
# to exit when there are still idle threads in a ThreadPoolExecutor's thread
# pool (i.e. shutdown() was not called). However, allowing workers to die with
# the interpreter has two undesirable properties:
# - The workers would still be running during interpreter shutdown,
# meaning that they would fail in unpredictable ways.
# - The workers could be killed while evaluating a work item, which could
# be bad if the callable being evaluated has external side-effects e.g.
# writing to a file.
#
# To work around this problem, an exit handler is installed which tells the
# workers to exit when their work queues are empty and then waits until the
# threads finish.
_threads_queues = weakref.WeakKeyDictionary()
_shutdown = False
def _python_exit():
global _shutdown
_shutdown = True
items = list(_threads_queues.items())
for t, q in items:
q.put(None)
for t, q in items:
t.join()
atexit.register(_python_exit)
There is an exit handler which will join all unfinished worker...
Here's the way to avoid this problem. Bad design can be beaten by another bad design. People write daemon=True only if they really know that the worker won't damage any objects or files.
In my case, I created TreadPoolExecutor with a single worker and after a single submit I just deleted the newly created thread from the queue so the interpreter won't wait till this thread stops on its own. Notice that worker threads are created after submit, not after the initialization of TreadPoolExecutor.
import concurrent.futures.thread
from concurrent.futures import ThreadPoolExecutor
...
executor = ThreadPoolExecutor(max_workers=1)
future = executor.submit(lambda: self._exec_file(args))
del concurrent.futures.thread._threads_queues[list(executor._threads)[0]]
It works in Python 3.8 but may not work in 3.9+ since this code is accessing private variables.
See the working piece of code on github
Is there a way in python to interrupt a thread when it's sleeping?
(As we can do in java)
I am looking for something like that.
import threading
from time import sleep
def f():
print('started')
try:
sleep(100)
print('finished')
except SleepInterruptedException:
print('interrupted')
t = threading.Thread(target=f)
t.start()
if input() == 'stop':
t.interrupt()
The thread is sleeping for 100 seconds and if I type 'stop', it interrupts
The correct approach is to use threading.Event. For example:
import threading
e = threading.Event()
e.wait(timeout=100) # instead of time.sleep(100)
In the other thread, you need to have access to e. You can interrupt the sleep by issuing:
e.set()
This will immediately interrupt the sleep. You can check the return value of e.wait to determine whether it's timed out or interrupted. For more information refer to the documentation: https://docs.python.org/3/library/threading.html#event-objects .
How about using condition objects: https://docs.python.org/2/library/threading.html#condition-objects
Instead of sleep() you use wait(timeout). To "interrupt" you call notify().
If you, for whatever reason, needed to use the time.sleep function and happened to expect the time.sleep function to throw an exception and you simply wanted to test what happened with large sleep values without having to wait for the whole timeout...
Firstly, sleeping threads are lightweight and there's no problem just letting them run in daemon mode with threading.Thread(target=f, daemon=True) (so that they exit when the program does). You can check the result of the thread without waiting for the whole execution with t.join(0.5).
But if you absolutely need to halt the execution of the function, you could use multiprocessing.Process, and call .terminate() on the spawned process. This does not give the process time to clean up (e.g. except and finally blocks aren't run), so use it with care.
I am trying to write a Python multi-threaded script that does the following two things in different threads:
Parent: Start Child Thread, Do some simple task, Stop Child Thread
Child: Do some long running task.
Below is a simple way to do it. And it works for me:
from multiprocessing import Process
import time
def child_func():
while not stop_thread:
time.sleep(1)
if __name__ == '__main__':
child_thread = Process(target=child_func)
stop_thread = False
child_thread.start()
time.sleep(3)
stop_thread = True
child_thread.join()
But a complication arises because in actuality, instead of the while-loop in child_func(), I need to run a single long-running process that doesn't stop unless it is killed by Ctrl-C. So I cannot periodically check the value of stop_thread in there. So how can I tell my child process to end when I want it to?
I believe the answer has to do with using signals. But I haven't seen a good example of how to use them in this exact situation. Can someone please help by modifying my code above to use signals to communicate between the Child and the Parent thread. And making the child-thread terminate iff the user hits Ctrl-C.
There is no need to use the signal module here unless you want to do cleanup on your child process. It is possible to stop any child processes using the terminate method (which has the same effect as SIGTERM)
from multiprocessing import Process
import time
def child_func():
time.sleep(1000)
if __name__ == '__main__':
event = Event()
child_thread = Process(target=child_func)
child_thread.start()
time.sleep(3)
child_thread.terminate()
child_thread.join()
The docs are here: https://docs.python.org/2/library/multiprocessing.html#multiprocessing.Process.terminate
I'm new to thread in python, i have a question that, supposed i start 3 threads like below, each one takes care of 1 different task:
def start( taskName, delay):
// do somthing with each taskName
# Create two threads as follows
try:
thread.start_new_thread( start, ("task1", ) )
thread.start_new_thread( start, ("task2", ) )
thread.start_new_thread( start, ("task3", ) )
except:
print "Error: unable to start thread"
Supposed that for each "start", it takes around 10-15 seconds to finish depending on each taskName it is. My question is that, if task 1 finishes in 12 seconds, tasks 2 in 10secs and task 3 in 15 seconds. Will task 2 finish then close and leave task 1 and task 3 to run till finish, or will task 2 force task 1 and 3 to close after task 2 is finished?
Are there any arguments that we can pass to the start_new_thread method in order to archive 2 of the things that I have mentioned above:
1. First to finish forces the rest to close.
2. Each one finish individually.
Thank you
As Max Noel already mentioned, it is advised to use the Thread class instead of using start_new_thread.
Now, as for your two questions:
1. First to finish forces the rest to close
You will need two important things: a shared queue that the threads can put their ID in once they are done. And a shared Event that will signal all threads to stop working when it is triggered. The main thread will wait for the first thread to put something in the queue and will then trigger the event to stop all threads.
import threading
import random
import time
import Queue
def work(worker_queue, id, stop_event):
while not stop_event.is_set():
print "This is worker", id
# do stuff
time.sleep(random.random() * 5)
# put worker ID in queue
if not stop_event.is_set():
worker_queue.put(id)
break
# queue for workers
worker_queue = Queue.Queue()
# indicator for other threads to stop
stop_event = threading.Event()
# run workers
threads = []
threads.append(threading.Thread(target=work, args=(worker_queue, 0, stop_event)))
threads.append(threading.Thread(target=work, args=(worker_queue, 1, stop_event)))
threads.append(threading.Thread(target=work, args=(worker_queue, 2, stop_event)))
for thread in threads:
thread.start()
# this will block until the first element is in the queue
first_finished = worker_queue.get()
print first_finished, 'was first!'
# signal the rest to stop working
stop_event.set()
2. Each finish individually
Now this is much easier. Just call the join method on all Thread objects. This will wait for each thread to finish.
for thread in threads:
thread.start()
for thread in threads:
thread.join()
Btw, the above code is for Python 2.7. Let me know if you need Python 3
First off, don't use start_new_thread, it's a low-level primitive. Use the Thread class in the threading module instead.
Once you have that, Thread instances have a .join() method, which you can call from another thread (your program's main thread) to wait for them to terminate.
t1 = Thread(target=my_func)
t1.start()
# Waits for t1 to finish.
t1.join()
All threads will terminate when the process terminates.
Thus, if your main program ends after the try..except, then all three threads may get terminated prematurely. For example:
import thread
import logging
import time
logger = logging.getLogger(__name__)
def start(taskname, n):
for i in range(n):
logger.info('{}'.format(i))
time.sleep(0.1)
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG,
format='[%(asctime)s %(threadName)s] %(message)s',
datefmt='%H:%M:%S')
try:
thread.start_new_thread( start, ("task1", 10) )
thread.start_new_thread( start, ("task2", 5) )
thread.start_new_thread( start, ("task3", 8) )
except Exception as err:
logger.exception(err)
may print something like
[14:15:16 Dummy-3] 0
[14:15:16 Dummy-1] 0
In contrast, if you place
time.sleep(5)
at the end of the script, then you see the full expected output from all three
threads.
Note also that the thread module is a low-level module; unless you have a
particular reason for using it, most often people use the threading module which
implements more useful features for dealing with threads, such as a join
method which blocks until the thread has finished. See below for an example.
The docs state:
When the function returns, the thread silently exits.
When the function terminates with an unhandled exception, a stack trace is
printed and then the thread exits (but other threads continue to run).
Thus, by default, when one thread finishes, the others continue to run.
The example above also demonstrates this.
To make all the threads exit when one function finishes is more difficult.
One thread can not kill another thread cleanly (e.g., without killing the entire
process.)
Using threading, you could arrange for the threads to set a variable
(e.g. flag) to True when finished, and have each thread check the state of
flag periodically and quit if it is True. But note that the other threads will
not necessarily terminate immediately; they will only terminate when they next
check the state of flag. If a thread is blocked, waiting for I/O for instance,
then it may not check the flag for a considerable amount of time (if ever!).
However, if the thread spends most of its time in a quick loop, you could check the state of flag once per iteration:
import threading
import logging
import time
logger = logging.getLogger(__name__)
def start(taskname, n):
global flag
for i in range(n):
if flag:
break
logger.info('{}'.format(i))
time.sleep(0.1)
else:
# get here if loop finishes without breaking
logger.info('FINISHED')
flag = True
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG,
format='[%(asctime)s %(threadName)s] %(message)s',
datefmt='%H:%M:%S')
threads = list()
flag = False
try:
threads.append(threading.Thread(target=start, args=("task1", 10) ))
threads.append(threading.Thread(target=start, args=("task2", 5) ))
threads.append(threading.Thread(target=start, args=("task3", 8) ))
except Exception as err:
logger.exception(err)
for t in threads:
t.start()
for t in threads:
# make the main process wait until all threads have finished.
t.join()
I'm trying to run the following code (it i simplified a bit):
def RunTests(self):
from threading import Thread
import signal
global keep_running
keep_running = True
signal.signal( signal.SIGINT, stop_running )
for i in range(0, NumThreads):
thread = Thread(target = foo)
self._threads.append(thread)
thread.start()
# wait for all threads to finish
for t in self._threads:
t.join()
def stop_running(signl, frme):
global keep_testing
keep_testing = False
print "Interrupted by the Master. Good by!"
return 0
def foo(self):
global keep_testing
while keep_testing:
DO_SOME_WORK();
I expect that the user presses Ctrl+C the program will print the good by message and interrupt. However it doesn't work. Where is the problem?
Thanks
Unlike regular processes, Python doesn't appear to handle signals in a truly asynchronous manner. The 'join()' call is somehow blocking the main thread in a manner that prevents it from responding to the signal. I'm a bit surprised by this since I don't see anything in the documentation indicating that this can/should happen. The solution, however, is simple. In your main thread, add the following loop prior to calling 'join()' on the threads:
while keep_testing:
signal.pause()
From the threading docs:
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left. The initial value is inherited from the creating thread. The flag can be set through the daemon property.
You could try setting thread.daemon = True before calling start() and see if that solves your problem.