I have a multithreaded Python program (financial trading) in which certain threads execute critical sections (like in the middle of executing a trade). The thread executing the critical sections are daemon threads. The main thread of the program captures SIGINT and tries to exit the program gracefully by releasing all resources held by child threads. In order to prevent the main thread causing the child threads to terminate abruptly; the main the thread will loop through the list of child thread objects and call their shutdown() function. This function will block till a critical section of the thread completes before returning.
The following is a basic implementation
class ChildDaemonThread(Thread):
def __init__(self):
self._critical_section = False
# other initialisations
def shutdown(self):
# called by parent thread before calling sys.exit(0)
while True:
if not self._critical_section:
break
# add code to prevent entering critical section
# do resource deallocation
def do_critical_stuff(self):
self._critical_section = True
# do critical stuff
self._critical_section = False
def run(self):
while True:
self._do_critical_stuff()
I am not sure if my implementation will work because while the ChildDaemonThread is executing critical section through do_critical_stuff(), if the parent thread calls the child's shutdown(), which blocks till a critical section executes, then at this point two methods of the ChildDaemonThread run() and do_critical_stuff() are called at the same time (I am not sure if this is even legal). Is this possible? Is my implementation correct? Is there a better way to achieve this?
There are some race conditions in this implementation.
You have no guarantee that the main thread will check the value of _critical_section at the right time to see a False value. The worker thread may leave and re-enter the critical section before the main thread gets around to checking the value again. This may not cause any issues of correctness but it could cause your program to take longer to shut down (since when the main thread "misses" a safe time to shut down it will have to wait for another critical section to complete).
Additionally, the worker thread may re-enter the critical after the main thread has noticed that _critical_section is False but before the main thread manages to cause the process to exit. This could pose real correctness issues since it effectively breaks your attempt to make sure the critical section completes.
Of course, the program may also crash due to some other issue. Therefore, it may be better if you implement the ability to recover from an interrupted critical section.
However, if you want to improve this strategy to the greatest extent possible, I would suggest something more like this:
class ChildDaemonThread(Thread):
def __init__(self):
self._keep_running = True
# other initialisations
def shutdown(self):
# called by parent thread before calling sys.exit(0)
self._keep_running = False
def do_critical_stuff(self):
# do critical stuff
def run(self):
while self._keep_running:
self._do_critical_stuff()
# do resource deallocation
workers = [ChildDaemonThread(), ...]
# Install your SIGINT handler which calls shutdown on all of workers
# ...
# Start all the workers
for w in workers:
w.start()
# Wait for the run method of all the workers to return
for w in workers:
w.join()
The key here is that join will block until the thread has finished. This ensures you're not interrupting one mid-critical-section.
Related
In my code (a complex GUI application with Tkinter) I have a thread defined in a custom object (a progress bar). It runs a function with a while cicle like this:
def Start(self):
while self.is_active==True:
do it..
time.sleep(1)
do it..
time.sleep(1)
def Stop(self):
self.is_active=False
It can terminate only when another piece of code, placed in another thread, changes the attribute self.is_active using the method self.Stop(). I have the same situation in another custom object (a counter) and both of them have to work together when the another thread (the main one) works.
The code works, but I realized that the two threads associated with the progress bar and the counter don't terminate instantly as I wanted, because before to temrinate, they need to wait the end of their functions, and these ones are slow becose of the time.sleep(1) instructions. From the user point of view, it means see the end of the main thread with the progress bar and the cunter that terminate LATE and I don't like it.
To be honest I don't know how to solve this issue. Is there a way to force a thread to terminate instantly without waiting the end of the function?
First off, to be clear, hard-killing a thread is a terrible idea in any language, and Python doesn't support it; if nothing else, the risk of that thread holding a lock which is never unlocked, causing any thread that tries to acquire it to deadlock, is a fatal flaw.
If you don't care about the thread at all, you can create it with the daemon=True argument, and it will die if all non-daemon threads in the process have exited. But if the thread really should die with proper cleanup (e.g. it might have with statements or the like that manage cleanup of resources outside the process, that won't be cleaned up on process termination), that's not a real solution.
That said, you can avoid waiting a second or more by switching from using a plain bool and time.sleep to using an Event and using the .wait method on it. This will allow the "sleeps" to be interrupted immediately, at the small expense of requiring you to reverse your condition (because Event.wait only blocks while it's false/unset, so you need the flag to be based on when you should stop, not when you are currently active):
class Spam:
def __init__(self):
self.should_stop = threading.Event() # Create an unset event on init
def Start(self):
while not self.should_stop.is_set():
# do it..
if self.should_stop.wait(1):
break
# do it..
if self.should_stop.wait(1):
break
def Stop(self):
self.should_stop.set()
On modern Python (3.1 and higher) the wait method returns True if the event was set (on beginning the wait or because it got set while waiting), and False otherwise, so whenever wait returns True, that means you were told to stop and you can immediately break out of the loop. You also get notified almost immediately, instead of waiting up to one second before you can check the flag.
This won't cause the real "do it.." code to exit immediately, but from what you said, it sounds like that part of the code isn't all that long, so waiting for it to complete isn't a big hassle.
If you really want to preserve the is_active attribute for testing whether it's still active, you can define it as a property that reverses the meaning of the Event, e.g.:
#property
def is_active(self):
return not self.should_stop.is_set()
the safest way to do it without risking a segmentation fault, is to return.
def Start(self):
while self.is_active==True:
do it..
if not self.is_active: return
time.sleep(1)
if not self.is_active: return
do it..
if not self.is_active: return
time.sleep(1)
def Stop(self):
self.is_active=False
python threads need to free the associated resources, and while "killing" the thread is possible using some C tricks, you will be risking a segmentation fault or a memory leak.
here is a cleaner way to do it.
class MyError(Exception):
pass
def Start(self):
try:
while self.is_active==True:
do it..
self.check_termination()
time.sleep(1)
self.check_termination()
do it..
self.check_termination()
time.sleep(1)
except MyError:
return
def check_termination(self):
if not self.is_active:
raise MyError
and you can call self.check_termination() from inside any function to terminate this loop, not necessarily from inside Start directly.
Edit: ShadowRanger solution handles the "interruptable wait" better, i am just keeping this for implementing a kill switch for the thread that can be checked from anywhere inside the thread.
The thing I cannot figure out is that although ThreadPoolExecutor uses daemon workers, they will still run even if main thread exit.
I can provide a minimal example in python3.6.4:
import concurrent.futures
import time
def fn():
while True:
time.sleep(5)
print("Hello")
thread_pool = concurrent.futures.ThreadPoolExecutor()
thread_pool.submit(fn)
while True:
time.sleep(1)
print("Wow")
Both main thread and the worker thread are infinite loops. So if I use KeyboardInterrupt to terminate main thread, I expect that the whole program will terminate too. But actually the worker thread is still running even though it is a daemon thread.
The source code of ThreadPoolExecutor confirms that worker threads are daemon thread:
t = threading.Thread(target=_worker,
args=(weakref.ref(self, weakref_cb),
self._work_queue))
t.daemon = True
t.start()
self._threads.add(t)
Further, if I manually create a daemon thread, it works like a charm:
from threading import Thread
import time
def fn():
while True:
time.sleep(5)
print("Hello")
thread = Thread(target=fn)
thread.daemon = True
thread.start()
while True:
time.sleep(1)
print("Wow")
So I really cannot figure out this strange behavior.
Suddenly... I found why. According to much more source code of ThreadPoolExecutor:
# Workers are created as daemon threads. This is done to allow the interpreter
# to exit when there are still idle threads in a ThreadPoolExecutor's thread
# pool (i.e. shutdown() was not called). However, allowing workers to die with
# the interpreter has two undesirable properties:
# - The workers would still be running during interpreter shutdown,
# meaning that they would fail in unpredictable ways.
# - The workers could be killed while evaluating a work item, which could
# be bad if the callable being evaluated has external side-effects e.g.
# writing to a file.
#
# To work around this problem, an exit handler is installed which tells the
# workers to exit when their work queues are empty and then waits until the
# threads finish.
_threads_queues = weakref.WeakKeyDictionary()
_shutdown = False
def _python_exit():
global _shutdown
_shutdown = True
items = list(_threads_queues.items())
for t, q in items:
q.put(None)
for t, q in items:
t.join()
atexit.register(_python_exit)
There is an exit handler which will join all unfinished worker...
Here's the way to avoid this problem. Bad design can be beaten by another bad design. People write daemon=True only if they really know that the worker won't damage any objects or files.
In my case, I created TreadPoolExecutor with a single worker and after a single submit I just deleted the newly created thread from the queue so the interpreter won't wait till this thread stops on its own. Notice that worker threads are created after submit, not after the initialization of TreadPoolExecutor.
import concurrent.futures.thread
from concurrent.futures import ThreadPoolExecutor
...
executor = ThreadPoolExecutor(max_workers=1)
future = executor.submit(lambda: self._exec_file(args))
del concurrent.futures.thread._threads_queues[list(executor._threads)[0]]
It works in Python 3.8 but may not work in 3.9+ since this code is accessing private variables.
See the working piece of code on github
I have a python multiprocessing setup (i.e. worker processes) with custom signal handling, which prevents the worker from cleanly using multiprocessing itself. (See extended problem description below).
The Setup
The master class that spawns all worker processes looks like the following (some parts stripped to only contain the important parts).
Here, it re-binds its own signals only to print Master teardown; actually the received signals are propagated down the process tree and must be handled by the workers themselves. This is achieved by re-binding the signals after workers have been spawned.
class Midlayer(object):
def __init__(self, nprocs=2):
self.nprocs = nprocs
self.procs = []
def handle_signal(self, signum, frame):
log.info('Master teardown')
for p in self.procs:
p.join()
sys.exit()
def start(self):
# Start desired number of workers
for _ in range(nprocs):
p = Worker()
self.procs.append(p)
p.start()
# Bind signals for master AFTER workers have been spawned and started
signal.signal(signal.SIGINT, self.handle_signal)
signal.signal(signal.SIGTERM, self.handle_signal)
# Serve forever, only exit on signals
for p in self.procs:
p.join()
The worker class bases multiprocessing.Process and implements its own run()-method.
In this method, it connects to a distributed message queue and polls the queue for items forever. Forever should be: until the worker receives SIGINT or SIGTERM. The worker should not quit immediately; instead, it has to finish whatever calculation it does and will quit afterwards (once quit_req is set to True).
class Worker(Process):
def __init__(self):
self.quit_req = False
Process.__init__(self)
def handle_signal(self, signum, frame):
print('Stopping worker (pid: {})'.format(self.pid))
self.quit_req = True
def run(self):
# Set signals for worker process
signal.signal(signal.SIGINT, self.handle_signal)
signal.signal(signal.SIGTERM, self.handle_signal)
q = connect_to_some_distributed_message_queue()
# Start consuming
print('Starting worker (pid: {})'.format(self.pid))
while not self.quit_req:
message = q.poll()
if len(message):
try:
print('{} handling message "{}"'.format(
self.pid, message)
)
# Facade pattern: Pick the correct target function for the
# requested message and execute it.
MessageRouter.route(message)
except Exception as e:
print('{} failed handling "{}": {}'.format(
self.pid, message, e.message)
)
The Problem
So far for the basic setup, where (almost) everything works fine:
The master process spawns the desired number of workers
Each worker connects to the message queue
Once a message is published, one of the workers receives it
The facade pattern (using a class named MessageRouter) routes the received message to the respective function and executes it
Now for the problem: Target functions (where the message gets directed to by the MessageRouter facade) may contain very complex business logic and thus may require multiprocessing.
If, for example, the target function contains something like this:
nproc = 4
# Spawn a pool, because we have expensive calculation here
p = Pool(processes=nproc)
# Collect result proxy objects for async apply calls to 'some_expensive_calculation'
rpx = [p.apply_async(some_expensive_calculation, ()) for _ in range(nproc)]
# Collect results from all processes
res = [rpx.get(timeout=.5) for r in rpx]
# Print all results
print(res)
Then the processes spawned by the Pool will also redirect their signal handling for SIGINT and SIGTERM to the worker's handle_signal function (because of signal propagation to the process subtree), essentially printing Stopping worker (pid: ...) and not stopping at all. I know, that this happens due to the fact that I have re-bound the signals for the worker before its own child-processes are spawned.
This is where I'm stuck: I just cannot set the workers' signals after spawning its child processes, because I do not know whether or not it spawns some (target functions are masked and may be written by others), and because the worker stays (as designed) in its poll-loop. At the same time, I cannot expect the implementation of a target function that uses multiprocessing to re-bind its own signal handlers to (whatever) default values.
Currently, I feel like restoring signal handlers in each loop in the worker (before the message is routed to its target function) and resetting them after the function has returned is the only option, but it simply feels wrong.
Do I miss something? Do you have any advice? I'd be really happy if someone could give me a hint on how to solve the flaws of my design here!
There is not a clear approach for tackling the issue in the way you want to proceed. I often find myself in situations where I have to run unknown code (represented as Python entry point functions which might get down into some C weirdness) in multiprocessing environments.
This is how I approach the problem.
The main loop
Usually the main loop is pretty simple, it fetches a task from some source (HTTP, Pipe, Rabbit Queue..) and submits it to a Pool of workers. I make sure the KeyboardInterrupt exception is correctly handled to shutdown the service.
try:
while 1:
task = get_next_task()
service.process(task)
except KeyboardInterrupt:
service.wait_for_pending_tasks()
logging.info("Sayonara!")
The workers
The workers are managed by a Pool of workers from either multiprocessing.Pool or from concurrent.futures.ProcessPoolExecutor. If I need more advanced features such as timeout support I either use billiard or pebble.
Each worker will ignore SIGINT as recommended here. SIGTERM is left as default.
The service
The service is controlled either by systemd or supervisord. In either cases, I make sure that the termination request is always delivered as a SIGINT (CTL+C).
I want to keep SIGTERM as an emergency shutdown rather than relying only on SIGKILL for that. SIGKILL is not portable and some platforms do not implement it.
"I whish it was that simple"
If things are more complex, I'd consider the use of frameworks such as Luigi or Celery.
In general, reinventing the wheel on such things is quite detrimental and gives little gratifications. Especially if someone else will have to look at that code.
The latter sentence does not apply if your aim is to learn how these things are done of course.
I was able to do this using Python 3 and set_start_method(method) with the 'forkserver' flavour. Another way Python 3 > Python 2!
Where by "this" I mean:
Have a main process with its own signal handler which just joins the children.
Have some worker processes with a signal handler which may spawn...
further subprocesses which do not have a signal handler.
The behaviour on Ctrl-C is then:
manager process waits for workers to exit.
workers run their signal handlers, (an maybe set a stop flag and continue executing to finish their job, although I didn't bother in my example, I just joined the child I knew I had) and then exit.
all children of the workers die immediately.
Of course note that if your intention is for the children of the workers not to crash you will need to install some ignore handler or something for them in your worker process run() method, or somewhere.
To mercilessly lift from the docs:
When the program starts and selects the forkserver start method, a server process is started. From then on, whenever a new process is needed, the parent process connects to the server and requests that it fork a new process. The fork server process is single threaded so it is safe for it to use os.fork(). No unnecessary resources are inherited.
Available on Unix platforms which support passing file descriptors over Unix pipes.
The idea is therefore that the "server process" inherits the default signal handling behaviour before you install your new ones, so all its children also have default handling.
Code in all its glory:
from multiprocessing import Process, set_start_method
import sys
from signal import signal, SIGINT
from time import sleep
class NormalWorker(Process):
def run(self):
while True:
print('%d %s work' % (self.pid, type(self).__name__))
sleep(1)
class SpawningWorker(Process):
def handle_signal(self, signum, frame):
print('%d %s handling signal %r' % (
self.pid, type(self).__name__, signum))
def run(self):
signal(SIGINT, self.handle_signal)
sub = NormalWorker()
sub.start()
print('%d joining %d' % (self.pid, sub.pid))
sub.join()
print('%d %s joined sub worker' % (self.pid, type(self).__name__))
def main():
set_start_method('forkserver')
processes = [SpawningWorker() for ii in range(5)]
for pp in processes:
pp.start()
def sig_handler(signum, frame):
print('main handling signal %d' % signum)
for pp in processes:
pp.join()
print('main out')
sys.exit()
signal(SIGINT, sig_handler)
while True:
sleep(1.0)
if __name__ == '__main__':
main()
Since my previous answer was python 3 only, I thought I'd also suggest a more dirty method for fun which should work on both python 2 and python 3. Not Windows though...
multiprocessing just uses os.fork() under the covers, so patch it to reset the signal handling in the child:
import os
from signal import SIGINT, SIG_DFL
def patch_fork():
print('Patching fork')
os_fork = os.fork
def my_fork():
print('Fork fork fork')
cpid = os_fork()
if cpid == 0:
# child
signal(SIGINT, SIG_DFL)
return cpid
os.fork = my_fork
You can call that at the start of the run method of your Worker processes (so that you don't affect the Manager) and so be sure that any children will ignore those signals.
This might seem crazy, but if you're not too concerned about portability it might actually not be a bad idea as it's simple and probably pretty resilient over different python versions.
You can store pid of main process (when register signal handler) and use it inside signal handler to route execution flow:
if os.getpid() != main_pid:
sys.exit(128 + signum)
I can't get my Python app to exit. After a call to sys.exit(), python.exe stays running and I have to kill it with task manager.
I've spent the past 4 hours looking into this, and I'm stumped.
This is Python 3.4.4 on Windows 10 x86.
First, I do have a multithreaded application. However I have verified that all threads are exiting with only the main thread running before I call sys.exit(). (I did this by calling threading.enumerate() in a while loop and waiting until there's only the main thread remaining, printing the list of running threads and watching it get smaller on each loop until only the main thread remains.)
Also, I've confirmed that I don't have anything wrapped in a try: block that would be swallowing the SystemExit exception. If I print sys.exc_info() I get (None, None, None), and if I call raise then it also confirms there are no exceptions pending.
What's interesting is that I've narrowed this down to the offending thread by commenting out different parts of my app to disable each thread one-by-one. (I have 4 threads total, each doing different things.)
If I comment out the thread in question, I can quit my app no problem. But again, even when I have that thread running, that thread does successfully exit, there's just something in there that's preventing the main Python exe from exiting.
I've tried setting the daemon flag, but that doesn't do anything either way. The offending thread's purpose is to wait at a PriorityQueue() with a 1 second timeout, and then when that times out it checks a threading.Event() flag to exit itself gracefully. Again, that works fine. I can see in my while() loop while the program is exiting that that thread is running, then stops.
The only other information is this application is launched via a console_scripts entry. I've looked at the script file that setuptools creates and see that just wraps the call to my entry point in a sys.exit(), but even hacking that file, I just cannot get this thing to exit.
I've tried calling sys.exit, raising SystemExit, and simply returning to let the console_script call sys.exit. None of those work.
I've also tried more brute force efforts, like os._exit(), but that also doesn't work.
What's really weird is that if I create a recursive loop (a simple one-line method that just calls itself), and I put that in my stop method before I set my threading Event which stops the threads, then Python will exit as it should. (I did that by mistake and first and was dumbfounded that that works. But if I move that loop call down a few lines to just before I call sys.exit, then the recursive loop doesn't kill python.exe. So even though my problem thread exits properly, something about it trying to exit is causing Python.exe to hang.
So, my question, does anyone have any other ideas or things to try about why Python won't exit? Specifically why my problem thread stops and only the main thread remains, yet sys.exit or os._exit() do nothing? I'm completely stumped.
My app consumes about 90MB of memory, and in task manager, I can see the GC doing its job as when my app is "hung" after the sys.exit() call, I see the memory usage drop from 90MB to 0.1MB over the course of about 30 seconds. But even after leaving it, python.exe doesn't stop.
Update: Here's some code that demonstrates what things look like:
From the module and function that's registered as the console_script:
def run_from_command_line(args=None):
path = os.path.abspath(os.path.curdir)
CommandLineUtility(path).execute()
From the CommandLineUtility() which starts my app. This is the last line:
def __init__(...):
... skipping a bunch of setup stuff
MpfMc(options=vars(args), config=mpf_config,
machine_path=machine_path).run() # this is not a threading run, just the name of the method for my app
From MpfMc():
def __init__(...):
...
self.thread_stopper = threading.Event()
...
self.asset_manager = AssetManager(self)
From AssetManager():
self.loader_thread = AssetLoader(loader_queue=self.loader_queue,
loaded_queue=self.loaded_queue,
exception_queue=self.machine.crash_queue,
thread_stopper=self.machine.thread_stopper)
self.loader_thread.daemon = True
self.loader_thread.start()
From AssetLoader:
def run(self):
"""Run loop for the loader thread."""
while True:
try:
asset = self.loader_queue.get(block=True, timeout=1)
except Empty:
asset = None
if self.thread_stopper.is_set():
return
if asset:
if not asset.loaded:
with asset.lock:
asset.do_load()
self.loaded_queue.put(asset)
From the MpfMc.stop() method that stops the app:
def stop(self):
self.log.info("Stopping ...")
self.thread_stopper.set()
while [x for x in self.threads if x.is_alive()]:
# self.threads is a list of threads I created, not the main thread.
print("Waiting for threads to stop")
print([x for x in self.threads if x.is_alive()])
print(threading.enumerate())
time.sleep(0.5)
for thread in self.threads:
# verify none of the sub threads are alive
print("THREAD", thread, thread.is_alive())
sys.exit() # here's where I also tried raise SystemExit, os._exit(), etc
Thanks!
I have a function I'm calling every 5 seconds like such:
def check_buzz(super_buzz_words):
print 'Checking buzz'
t = Timer(5.0, check_buzz, args=(super_buzz_words,))
t.dameon = True
t.start()
buzz_word = get_buzz_word()
if buzz_word is not 'fail':
super_buzz_words.put(buzz_word)
main()
check_buzz()
I'm exiting the script by either catching a KeyboardInterrupt or by catching a System exit and calling this:
sys.exit('\nShutting Down\n')
I'm also restarting the program every so often by calling:
execv(sys.executable, [sys.executable] + sys.argv)
My question is, how do I get that timer thread to shut off? If I keyboard interrupt, the timer keeps going.
I think you just spelled daemon wrong, it should have been:
t.daemon = True
Then sys.exit() should work
Expanding on the answer from notorious.no, and the comment asking:
How can I call t.cancel() if I have no access to t oustide the
function?
Give the Timer thread a distinct name when you first create it:
import threading
def check_buzz(super_buzz_words):
print 'Checking buzz'
t = Timer(5.0, check_buzz, args=(super_buzz_words,))
t.daemon = True
t.name = "check_buzz_daemon"
t.start()
Although the local variable t soon goes out of scope, the Timer thread that t pointed to still exists and still retains the name assigned to it.
Your atexit-registered method can then identify this thread by its name and cancel it:
from atexit import register
def all_done():
for thr in threading._enumerate():
if thr.name == "check_buzz_daemon":
if thr.is_alive():
thr.cancel()
thr.join()
register(all_done)
Calling join() after calling cancel()is based on a StackOverflow answer by Cédric Julien.
HOWEVER, your thread is set to be a Daemon. According to this StackOverflow post, daemon threads do not need to be explicitly terminated.
from atexit import register
def all_done():
if t.is_alive():
# do something that will close your thread gracefully
register(all_done)
Basically when your code is about to exit, it will fire one last function and this is where you will check if your thread is still running. If it is, do something that will either cancel the transaction or otherwise exit gracefully. In general, it's best to let threads finish by themselves, but if it's not doing anything important (please note the emphasis) than you can just do t.cancel(). Design your code so that threads will finish on their own if possible.
Another way would be to use the Queue() module to send and recieve info from a thread using the .put() outside the thread and the .get() inside the thread.
What you can also do is create a txt file and make program write to it when you exit And put an if statement in the thread function to check it after each iteration (this is not a really good solution but it also works)
I would have put a code exemple but i am writing from mobile sorry