Python Multiprocessing respawn crashed processes

Python Multiprocessing respawn crashed processes - python

I want to create some worker processes and if they crash due to an exception, I would like them to respawn. Aside from the is_alive method in the multiprocessing module, I can't seem to find a way to do this.
This would require me to iterate over all the running processes (after a sleep) and check if they are alive. This is essentially a busy loop, I was wondering if there was a better solution that will wake up my program in the event that any one of my worker processes has crashed. Once it wakes up, I would like to log th exception that crashed my program and launch another process.

Polling to see if the child processes are alive should work fine, since it's a low-overhead check and you don't need to check that often.
The first answer to this (similar) question has a Python code example: Multi-server monitor/auto restarter in python

You can wrap your worker processes in try/except blocks where the except pushes a message onto a pipe before raising. Of course, polling isn't really worse than this and it's simpler.

If you're on a unix-like system, your main program can be notified of dead children by installing a signal handler. Look up your operating system's documentation on signal(), especially SIGCHLD. I'm afraid I don't remember whether Windows covers SIGCHLD with its very limited POSIX signal support.

Related

What is the best way to debug a python multiprocess script which fails to terminate?

I am writing a python script which uses multiprocessing, multithreading and zeromq for interprocess communication. It all works fine until the program finishes: at that time the child processes terminate properly (sigwait is intercepted and the child procs terminate which I have confirmed with the ps command) but the main process often does not shut down - occasionally it does, but most of the time it does not. I have confirmed that all remaining threads of the main process are daemonic and that the last row of the script is executed properly (it is a logging.info call). I am using fork for forking processes and can see that a Forkprocess still runs in addition to the main process.
What is the best way to debug this, considering that the script has actually finished ? Maybe add a pdb or breakpoint() right at the end ?
Thanks in advance.
Here is the output, after the last row the script usually does not terminate:
INFO root::remaining active child processes: [<ForkProcess name='SyncManager-1' pid=6362 parent=6361 started>]
INFO root::non-daemonic threads which are still running, preventing orderly shutdown: [].
INFO root::======== PID: 6361 main() end: shut down completed.=========
EDIT:
I refactored the code and noticed that it now misbehaves very rarely. I am 99.9% certain that it is due to an open zeromq REQ/REP 'socket' at the time of shutdown. The refactoring made sure that these sockets are only held open only for a very short time - but it is not predictable what sockets are open at shutdown so occasionally it still hangs.
I will write a simple testharness with two processes communicating via REQ/REP sockets then shut down the child process followed by main process. I expect same result, i.e., interpreter not shutting down. Lets see, keep you posted.

I think you could try viztracer. The good thing about viztracer is that it can display all the processes on the same timeline. Maybe you can catch what's stopping your main process/forked process from shutting down. If it's a deadlock it should be noticeable. However, without the code, I really can't tell if it would help for sure.

Clean up a thread without .join() and without blocking the main thread

I am in a situation where I have two endpoints I can ask for a value, and one may be faster than the other. The calls to the endpoints are blocking. I want to wait for one to complete and take that result without waiting for the other to complete.
My solution was to issue the requests in separate threads and have those threads set a flag to true when they complete. In the main thread, I continuously check the flags (I know it is a busy wait, but that is not my primary concern right now) and when one completes it takes that value and returns it as the result.
The issue I have is that I never clean up the other thread. I can't find any way to do it without using .join(), which would just block and defeat the purpose of this whole thing. So, how can I clean up that other, slower thread that is blocking without joining it from the main thread?

What you want is to make your threads daemons, so when you get the result and finish your main, the other running thread will be forced to finish. You do that by changing the daemon keyword to True:
tr = threading.Thread(daemon=True)
From the threading docs:
The significance of this flag is that the entire Python program exits
when only daemon threads are left.
Although:
Daemon threads are abruptly stopped at shutdown. Their resources (such
as open files, database transactions, etc.) may not be released
properly. If you want your threads to stop gracefully, make them
non-daemonic and use a suitable signalling mechanism such as an Event.
I don't have any particular experience with Events so can't elaborate on that. Feel free to click the link and read on.

One bad and dirty solution is to implement a methode for the threads which close the socket which is blocking. Now you have to catch the exception in the main thread.

How can I kill a thread in python [duplicate]

This question already has answers here:
Is there any way to kill a Thread?
(31 answers)
Closed 6 years ago.
I start a thread using the following code.
t = thread.start_new_thread(myfunction)
How can I kill the thread t from another thread. So basically speaking in terms of code, I want to be able to do something like this.
t.kill()
Note that I'm using Python 2.4.

In Python, you simply cannot kill a Thread.
If you do NOT really need to have a Thread (!), what you can do, instead of using the threading package (http://docs.python.org/2/library/threading.html), is to use the multiprocessing package (http://docs.python.org/2/library/multiprocessing.html). Here, to kill a process, you can simply call the method:
yourProcess.terminate() # kill the process!
Python will kill your process (on Unix through the SIGTERM signal, while on Windows through the TerminateProcess() call). Pay attention to use it while using a Queue or a Pipe! (it may corrupt the data in the Queue/Pipe)
Note that the multiprocessing.Event and the multiprocessing.Semaphore work exactly in the same way of the threading.Event and the threading.Semaphore respectively. In fact, the first ones are clones of the latters.
If you REALLY need to use a Thread, there is no way to kill your threads directly. What you can do, however, is to use a "daemon thread". In fact, in Python, a Thread can be flagged as daemon:
yourThread.daemon = True # set the Thread as a "daemon thread"
The main program will exit when no alive non-daemon threads are left. In other words, when your main thread (which is, of course, a non-daemon thread) will finish its operations, the program will exit even if there are still some daemon threads working.
Note that it is necessary to set a Thread as daemon before the start() method is called!
Of course you can, and should, use daemon even with multiprocessing. Here, when the main process exits, it attempts to terminate all of its daemonic child processes.
Finally, please, note that sys.exit() and os.kill() are not choices.

If your thread is busy executing Python code, you have a bigger problem than the inability to kill it. The GIL will prevent any other thread from even running whatever instructions you would use to do the killing. (After a bit of research, I've learned that the interpreter periodically releases the GIL, so the preceding statement is bogus. The remaining comment stands, however.)
Your thread must be written in a cooperative manner. That is, it must periodically check in with a signalling object such as a semaphore, which the main thread can use to instruct the worker thread to voluntarily exit.
while not sema.acquire(False):
# Do a small portion of work…
or:
for item in work:
# Keep working…
# Somewhere deep in the bowels…
if sema.acquire(False):
thread.exit()

You can't kill a thread from another thread. You need to signal to the other thread that it should end. And by "signal" I don't mean use the signal function, I mean that you have to arrange for some communication between the threads.

Stopping a thread in python

I am creating a thread in my Python app with thread.start_new_thread.
How do I stop it if it hasn't finished in three seconds time?

You can't do that directly. Anyway aborting a thread is not good practice - rather think about using synchronization mechanisms that let you abort the thread in a "soft" way.
But daemonic threads will automatically be aborted if no non-daemonic threads remain (e.g. if the only main thread ends). Maybe that's what you want.

If you really need to do this (e.g. the thread calls code that may hang forever) then consider rewriting your code to spawn a process with the multiprocessing module. You can then kill the process with the Process.terminate() method. You will need 2.6 or later for this, of course.

You cannot. Threads can't be killed from outside. The only thing you can do is add a way to ask the thread to exit. Obviously you won't be able to do this if the thread is blocked in some systemcall.

As noted in a related question, you might be able to raise an exception through ctypes.pythonapi, but not while it's waiting on a system call.

detection of communication failure when "put" in queue

I am using the multiprocessing python module with Queue for communication between processes. Some processes only send (i.e. queue.put) and I can't seem to find a way to detect when the receiving end gets terminated abruptly.
Is there a way to detect if the process at the other end of the Queue gets terminated without having to get from the Queue? Isn't there a signal I could trap somehow? Or do I have to periodically get from the Queue and trap the EOFError manually.

I don't believe multiprocessing sets up a "watch-dog" process for you to take care of crashes or kills of some of your processes. It may be worth your while to set one up (pretty hard to do cross-platform, but if, say, you're only worried about Linux, it's not that terrible).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Multiprocessing respawn crashed processes - python

Polling to see if the child processes are alive should work fine, since it's a low-overhead check and you don't need to check that often. The first answer to this (similar) question has a Python code example: Multi-server monitor/auto restarter in python

You can wrap your worker processes in try/except blocks where the except pushes a message onto a pipe before raising. Of course, polling isn't really worse than this and it's simpler.

If you're on a unix-like system, your main program can be notified of dead children by installing a signal handler. Look up your operating system's documentation on signal(), especially SIGCHLD. I'm afraid I don't remember whether Windows covers SIGCHLD with its very limited POSIX signal support.

Related

What is the best way to debug a python multiprocess script which fails to terminate?

Clean up a thread without .join() and without blocking the main thread

How can I kill a thread in python [duplicate]

Stopping a thread in python

detection of communication failure when "put" in queue

Categories

Resources