Handling child process shutdown gracefully

Handling child process shutdown gracefully - python

I am working on a project where I have a pool of workers. I am not using the built-in multiprocessing.Pool, but have created my own process pool.
The way it works is that I have created two instances of multiprocessing.Queue - one for sending work tasks to the workers and another to receive the results back.
Each worker just sits in a permanently running loop like this:
while True:
try:
request = self.request_queue.get(True, 5)
except Queue.Empty:
continue
else:
result = request.callable(*request.args, **request.kwargs)
self.results_queue.put((request, result))
There is also some error-handling code, but I have left it out for brewity. Each worker process has daemon set to 1.
I wish to properly shutdown the main process and all child worker processes. My experiences so far (doing Ctrl+C):
With no special implementations, each child process stops/crashes with a KeyboardInterrupt traceback, but the main process does not exist and have to be killed (sudo kill -9).
If I implement a signal handler for the child processes, set to ignore SIGINT's, the main thread shows the KeyboardInterrupt tracebok but nothing happens either way.
If I implement a signal handler for the child processes and the main process, I can see that the signal handler is called in the main process, but calling sys.exit() does not seem to have any effect.
I am looking for a "best practice" way of handling this. I also read somewhere that shutting down processes that were interacting with Queues and Pipes might cause them to deadlock with other processes (due to the Semaphores and other stuff used internally).
My current approach would be the following:
- Find a way to send an internal signal to each process (using a seperate command queue or similar) that will terminate their main loop.
- Implement a signal handler for the main loop that sends the shutdown command. The child processes will have a child handler that sets them to ignore the signal.
Is this the right approach?

The thing you need to watch out for is to deal with the possibility that there are messages in the queues at the time that you want to shutdown so you need a way for your processes to drain their input queues cleanly. Assuming that your main process is the one that will recognize that it is time to shutdown, you could do this.
Send a sentinel to each worker process. This is a special message (frequently None) that can never look like a normal message. After the sentinel, flush and close the queue to each worker process.
In your worker processes use code similar to the following pseudocode:
while True: # Your main processing loop
msg = inqueue.dequeue() # A blocking wait
if msg is None:
break
do_something()
outqueue.flush()
outqueue.close()
If it is possible that several processes could be sending messages on the inqueue you will need a more sophisticated approach. This sample taken from the source code for the monitor method in logging.handlers.QueueListener in Python 3.2 or later shows one possibility.
"""
Monitor the queue for records, and ask the handler
to deal with them.
This method runs on a separate, internal thread.
The thread will terminate if it sees a sentinel object in the queue.
"""
q = self.queue
has_task_done = hasattr(q, 'task_done')
# self._stop is a multiprocessing.Event object that has been set by the
# main process as part of the shutdown processing, before sending
# the sentinel
while not self._stop.isSet():
try:
record = self.dequeue(True)
if record is self._sentinel:
break
self.handle(record)
if has_task_done:
q.task_done()
except queue.Empty:
pass
# There might still be records in the queue.
while True:
try:
record = self.dequeue(False)
if record is self._sentinel:
break
self.handle(record)
if has_task_done:
q.task_done()
except queue.Empty:
break

Related

python threading with a shared queue and thread classes

I've been trying to find an implementation that looks like mine but I can't seem to find one.
Specifics: I retrieve some database records and want to process all of them in a maximum of 5 threads. But I want these threads to report any potential errors and then close the individual threads (or log them). So I want to push all the records onto a queue and have the threads fetch from the queue.
So far I have this.
class DatabaseRecordImporterThread(threading.Thread):
def __init__(self, record_queue):
super(DatabaseRecordImporterThread, self).__init__()
self.record_queue = record_queue
def run(self):
try:
record = self.record_queue.get()
force_key_error(record)
except Exception as e:
print("Thread failed: ", e) # I want this to print to the main thread stdout
logger.log(e) # I want this to log to a shared log file (with appending)
MAX_THREAD_COUNT = 5
jobs = queue.Queue()
workers = []
database_records_retrieved = database.get_records(query) # unimportant
# this is where i put all records on a queue
for record in database_records_retrieved:
jobs.put(record)
for _ in range(MAX_THREAD_COUNT):
worker = DatabaseRecordImporterThread(jobs)
worker.start()
workers.append(worker)
print('*** Main thread waiting')
jobs.join()
print('*** Done')
So the idea is that every thread gets the jobs queue and they are retrieving records from it and printing. Since the amount to process isn't predesignated (defined to do k records at a time or something), each thread will attempt to just process whatever is on the queue. However the output looks like this, when I force an error.
Thread failed: 'KeyError'
Thread failed: 'KeyError'
Thread failed: 'KeyError'
Thread failed: 'KeyError'
Thread failed: 'KeyError'
*** Main thread waiting
when no errors are reported the threads only read one record each:
(record)
(record)
(record)
(record)
(record)
*** Main thread waiting
In the normal Threading setup, I understand that you can setup a queue by doing something like this
Thread(target=function, args=(parameters, queue)
But when you use a class that inherits the Thread object, how do you set this up properly? I can't seem to figure it out. One of my assumptions is that the queue object is not shallow, so every new object created actually refers to the same queue in memory - is this true?
The threads are hanging, obviously, because they are not(?) daemon threads. Not only that, but it seems as though the threads only read one record each and then do the same thing. Some thing I want to do but don't really understand how to do.
If all threads fail, the main thread should move on and say "*** Done."
The threads should continue processing the queue until it is empty
In order to do (2), I probably need something in the main thread like while !queue.empty but then how would I make sure that I limit the threads to only have a maximum of 5?

I figured out the answer to the question. After doing a lot of research and some code reading, what needs to happen is the following
The queue should not be checked whether or not it is empty since it presents a race condition. Rather, the workers should continue under an infinite loop and attempt to keep retrieving from the Queue
Whenever a queue task is finished, the queue.task_done() method needs to be called to alert the MainThread join() method. What happens is that the number of task_done calls will sync with the number of enqueue calls and the thread will officially join once the queue is empty.
Using a queue for a fixed data size task is somewhat suboptimal. Instead of creating a queue that each thread reads off of, it would be better to simply partition the data into chunks of equal size and have the threads just run processing a list subset. This way we don't potentially get blocked by queue.get() waiting for a new element to be added. Something like, while True: if not queue.empty(): do_something()
Exception handling should still make a call to task_done() if we want to proceed past. Deciding whether the whole thread should fail or not depending on whether an exception is caught is a design choice, but if it is the case, then the element should still be marked as processed.

Python multi threading exiting worker thread

So I have the main thread spawning worker threads that have infinite loops to deal with system monitoring. So it looks something like this
while True:
Check_server_status( host )
wait( *minutes* )
This worker thread should run forever because we need to constantly monitor the servers and each thread is currently monitoring one machine but I may scale it so each thread has a list of servers to check on. Also, each thread is writing to a csv file the information that it is finding.
The main thread just calls this thread for each host that it finds in a list.
hosts = [a,b]
threads = []
for host in hosts:
t = worker( host )
t.daemon = True
t.start()
threads.append( t )
I am trying to make this script exit cleanly on ctrl-c. So I want to make sure that the files are closed and that the threads exits. Is there any good way to handle this?
Thanks in advance for the help!

Well, for starters, daemon threads terminated by the main thread ending aren't cleaned up properly, so avoid that.
And KeyboardInterrupt isn't necessarily delivered to the main thread, or to any specific thread, so it's probably not best to rely on Python's default handler for SIGINT here, but instead write your own handler to replace it.
Probably the simplest approach is to have all the threads loop on a shared threading.Event object, looping based on a .wait(*seconds*) call (so your per loop sleep is folded into the while condition, and you can still exit immediately, anytime during the wait).
So you might do:
import threading
shouldexit = threading.Event()
Then your worker functions would be of the form:
def worker(host):
while not shouldexit.wait(minutes * 60):
Check_server_status( host )
They'd all be launched without setting them as daemons, and you'd register a SIGINT handler like this one:
import signal
def ctrlchandler(signum, frame):
print('User triggered quit')
shouldexit.set()
# Set the signal handlers for the Ctrl-C related events
if hasattr(signal, 'CTRL_C_EVENT'):
# Only on Windows
signal.signal(signal.CTRL_C_EVENT, ctrlchandler)
signal.signal(signal.SIGINT, ctrlchandler)
Ideally, there would be no living main thread to kill off in this case (all threads would be looping on the shared Event, with the main thread exiting after launching the workers); if that's not the case, you'd need to figure out some way of terminating the main thread.
One approach might be to store off the default handler for SIGINT when you register your replacement handler, have your handler join all the worker threads after setting the Event, then explicitly invoke the original SIGINT handler so KeyboardInterrupt fires in a surviving thread as normal and cleans up the now workerless main thread.

Python multiprocessing - watchdog process?

I have a set of long-running process in a typical "pub/sub" setup with queues for communication.
I would like to do two things, and I can't figure out how to accomplish both simultaneously:
Addition/removal of workers. For example, I want to be able to add extra consumers if I see that my pending queue size has grown too large.
Watchdog for my processes - I want to be notified if any of my producers or consumers crashes.
I can do (2) in isolation:
try:
while True:
for process in workers + consumers:
if not process.is_alive():
logger.critical("%-8s%s died!", process.pid, process.name)
sleep(3)
except KeyboardInterrupt:
# Python propagates CTRL+C to all workers, no need to terminate them
logger.warn('Received CTR+C, shutting down')
The above blocks, which prevents me from doing (1).
So I decided to move the code into its own process.
This doesn't work, because process.is_alive() only works for a parent checking the status of its children. In this case, the processes I want to check would be siblings instead of children.
I'm a bit stumped on how to proceed. How can my main process support changes to subprocesses while also monitoring subprocesses?

multiprocessing.Pool actually has a watchdog built-in already. It runs a thread that checks every 0.1 seconds to see if a worker has died. If it has, it starts a new one to take its place:
def _handle_workers(pool):
thread = threading.current_thread()
# Keep maintaining workers until the cache gets drained, unless the pool
# is terminated.
while thread._state == RUN or (pool._cache and thread._state != TERMINATE):
pool._maintain_pool()
time.sleep(0.1)
# send sentinel to stop workers
pool._taskqueue.put(None)
debug('worker handler exiting')
def _maintain_pool(self):
"""Clean up any exited workers and start replacements for them.
"""
if self._join_exited_workers():
self._repopulate_pool()
This is primarily used to implement the maxtasksperchild keyword argument, and is actually problematic in some cases. If a process dies while a map or apply command is running, and that process is in the middle of handling a task associated with that call, it will never finish. See this question for more information about that behavior.
That said, if you just want to know that a process has died, you can just create a thread (not a process) that monitors the pids of all the processes in the pool, and if the pids in the list ever change, you know a process has crashed:
def monitor_pids(pool):
pids = [p.pid for p in pool._pool]
while True:
new_pids = [p.pid for p in pool._pool]
if new_pids != pids:
print("A worker died")
pids = new_pids
time.sleep(3)
Edit:
If you're rolling your own Pool implementation, you can just take a cue from multiprocessing.Pool, and run your monitoring code in a background thread in the parent process. The checks to see if the processes are still running are quick, so the time lost to the background thread taking the GIL should be negligible. Consider that the multiprocessing.Process watchdog is running every 0.1 seconds! Running yours every 3 seconds shouldn't cause any problems.

Python using queues for countdown watchdog timer

I have a program which spawns 4 threads, these threads need to stay running indefinitely and if one of them crashes I need to know so I can restart.
If I use a list with 4 numbers and pass it to each thread through using a queue. Then all each thread has to do is reset its section in the timer while the main thread counts it down.
So the queue will never be empty, only a single value could go to 0, and then if this happens then the main thread knows its child hasn't responded and it can act accordingly.
But every time I .get() from the queue, it makes it empty, so I have to get from the queue, store into a variable, modify the variable and put it back in the queue.
Is this fine using the queue like this for a watchdog.

If you're using Threads, you could regularly check through threading.enumerate to make sure that you have the correct number and kind of threads running.
But, also, passing things into a Queue that gets returned from a thread is a technique that I have at least seen used to make sure that threads are still running. So, if I'm understanding you correctly, what you're doing isn't completely crazy.
Your "thread must re-set its sentinal occasionally" might make more sense to have as a list of Queues that each Thread is expected to respond to asap. This depends on if your Threads are actually doing process-intensive stuff, or if they're just backgrounded for interface reasons. If they're not spending all their time doing math, you could do something like:
def guarded_thread(sentinal_queue, *args):
while True:
try:
sentinal_queue.get_nowait()
sentinal_queue.put('got it')
except Queue.Empty:
# we just want to make sure that we respond if we have been
# pinged
pass
# do actual work with other args
def main(arguments):
queues = [Queue() for q in range(4)]
threads = [(Thread(target=guarded_thread, args=(queue, args)), queue)
for queue, args in zip(queues, arguments)]
for thread, queue in threads:
thread.start()
while True:
for thread, queue in threads:
queue.put(True)
for thread, queue in threads:
try:
response = queue.get(True, MAX_TIMEOUT)
if response != 'got it':
# either re-send or restart the thread
except Queue.Empty:
# restart the thread
time.sleep(PING_INTERVAL)
Note that you could also use different request/response queues to avoid having different kinds of sentinal values, it depends on your actual code which one would look less crazy.

Gracefully Terminating Python Threads

I am trying to write a unix client program that is listening to a socket, stdin, and reading from file descriptors. I assign each of these tasks to an individual thread and have them successfully communicating with the "main" application using synchronized queues and a semaphore. The problem is that when I want to shutdown these child threads they are all blocking on input. Also, the threads cannot register signal handlers in the threads because in Python only the main thread of execution is allowed to do so.
Any suggestions?

There is no good way to work around this, especially when the thread is blocking.
I had a similar issue ( Python: How to terminate a blocking thread) and the only way I was able to stop my threads was to close the underlying connection. Which resulted in the thread that was blocking to raise and exception and then allowed me to check the stop flag and close.
Example code:
class Example(object):
def __init__(self):
self.stop = threading.Event()
self.connection = Connection()
self.mythread = Thread(target=self.dowork)
self.mythread.start()
def dowork(self):
while(not self.stop.is_set()):
try:
blockingcall()
except CommunicationException:
pass
def terminate():
self.stop.set()
self.connection.close()
self.mythread.join()
Another thing to note is commonly blocking operations generally offer up a timeout. If you have that option I would consider using it. My last comment is that you could always set the thread to deamonic,
From the pydoc :
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left. The initial value is inherited from the creating thread. The flag can be set through the daemon property.

Also, the threads cannot register signal handlers
Signals to kill threads is potentially horrible, especially in C, especially if you allocate memory as part of the thread, since it won't be freed when that particular thread dies (as it belongs to the heap of the process). There is no garbage collection in C, so if that pointer goes out of scope, it's gone out of scope, the memory remains allocated. So just be careful with that one - only do it that way in C if you're going to actually kill all the threads and end the process so that the memory is handed back to the OS - adding and removing threads from a threadpool for example will give you a memory leak.
The problem is that when I want to shutdown these child threads they are all blocking on input.
Funnily enough I've been fighting with the same thing recently. The solution is literally don't make blocking calls without a timeout. So, for example, what you want ideally is:
def threadfunc(running):
while running:
blockingcall(timeout=1)
where running is passed from the controlling thread - I've never used threading but I have used multiprocessing and with this you actually need to pass an Event() object and check is_set(). But you asked for design patterns, that's the basic idea.
Then, when you want this thread to end, you run:
running.clear()
mythread.join()
and your main thread should then allow your client thread to handle its last call, and return, and the whole program folds up nicely.
What do you do if you have a blocking call without a timeout? Use the asynchronous option, and sleep (as in call whatever method you have to suspend the thread for a period of time so you're not spinning) if you need to. There's no other way around it.

See these answers:
Python SocketServer
How to exit a multithreaded program?
Basically, don't block on recv() by using select() with a timeout to check for readability of the socket, and poll a quit flag when select() times out.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.