I am using a Python process to run one of my functions like so:
Process1 = Process(target = someFunction)
Process1.start()
Now that function has no looping or anything, it just does its thing, then ends, does the Process die with it? or do I always need to drop a:
Process1.terminate()
Afterwards?
The child process will exit by itself - the Process1.terminate() is unnecessary in that regard. This is especially true if using any shared resources between the child and parent process. From the Python documentation:
Avoid terminating processes
Using the Process.terminate method to stop a process is liable to cause any shared resources (such as locks, semaphores, pipes and queues) currently being used by the process to become broken or unavailable to other processes.
Therefore it is probably best to only consider using Process.terminate on processes which never use any shared resources.
However, if you want the parent process to wait for the child process to finish (perhaps the child process is modifying something that the parent will access afterwards), then you'll want to use Process1.join() to block the parent process from continuing until the child process complete. This is generally good practice when using child processes to avoid zombie processes or orphaned children.
No, as per the documentation it only sends a SIGTERM or TerminateProcess() to the process in question. If it has already exited then there is nothing to terminate.
However, it is always a good process to use exit codes in your subprocesses:
import sys
sys.exit(1)
And then check the exit code once you know the process has terminated:
if Process1.exitcode():
errorHandle()
Related
I have a Python application which runs as the main process in a kubernetes pod, and this process kicks off some child processes to long poll a list of SQS queues (1 process per queue). Occasionally, one of the processes becomes a zombie and stops processing, and hangs up all other processes too, including the parent.
if __name__ == '__main__':
PROCESSES = []
for queue, module in qfmapper.items():
PROCESSES.append(Process(target=poll_for_messages, args=(queue,module)))
for process in PROCESSES:
process.start()
for process in PROCESSES:
process.join()
I've tried handling the SIGCHLD signal in the parent before it kicks off the children, but that doesn't seem to kill the parent if one of the children are killed. I know this leaves behind other child processes, but since kubernetes kills the pod if PID 1 dies, it shouldn't matter. This however doesn't seem to work, as the parent doesn't react to it. I'm assuming this is because process.join() blocks the parent.
So I've tried replacing individual Process calls with a Pool:
with contextlib.closing(mp.Pool(len(qfmapper))) as pool:
for queue, module in qfmapper.items():
pool.apply_async(poll_for_messages, args=(queue, module))
pool.close()
pool.join()
This again kicks off the polling processes as expected, but killing one doesn't seem to get replaced with the same call again. It spins up another worker to maintain the Pool, but it doesn't kick it off with the same arguments that the original apply_async call does.
I also tried using map, and that does restart the process if killed, but doesn't loop through all of the queues in my list; it just does the first one in the list multiple times. I've also tried starmap, and just used the for loop to build a list of iterables, but again that doesn't recover if one of the workers is killed.
So, ultimately, this comes back to the title of this question. How do you automatically restart a process that has died / been killed? I've searched high and low and I can't seem to find any answers for what seems to me like a "normal" thing to want to do. This is all running on Python 3.7.3, but I can upgrade to 3.8 if it has any features worth using to resolve this issue.
I have a subprocess via multiprocessing.Process and a queue via multiprocessing.Queue.
The main process is using multiprocessing.Queue.get() to get some new data. I don't want to have a timeout there and I want it to be blocking.
However, when the child process dies for whatever reason (manually killed by user via kill, or segfault, etc.), Queue.get() just will hang forever.
How can I avoid that?
I think multiprocessing.Queue is not what I want.
I'm using now
parent_conn, child_conn = multiprocessing.Pipe(duplex=True)
to get two multiprocessing.Connection objects. Then I os.fork() or use multiprocessing.Process. In the child, I do:
parent_conn.close()
# read/write on child_conn
In the parent (after the fork), I do:
child_conn.close()
# read/write on parent_conn
That way, when I call recv() on the connection, it will raise an exception (EOFError) when the child/parent dies in the meanwhile.
Note that this works only for a single child. I guess Queue is meant when you want multiple childs. In that case, you would probably anyway have some manager which watches whether all childs are alive and restarts them accordingly.
The Queue has no way of knowing when it does not have any possible writers anymore. You could pass the object to any number of subprocesses, and it does not know if you passed it to any given subprocess. So it will have to wait, even if a subprocess dies. A queue is not a file descriptor that is automatically closed when the child dies.
What you are looking for is some kind of supervisor in the parent process that notices when children die unexpectedly and handle that situation in whatever way you think appropriate. You can do this by catching a SIGCHLD process, checking Process.is_alive or using Process.join in a thread. A simple implementation would use the timeout parameter in the Queue.get call and do a Process.is_alive check when that returns.
If you have a bit more control over the death of the child process, it should send an "EOF"-type object (None, or some kind of marker that it is done) to the queue so your parent process can handle it correctly.
I have a problem with creating parallel program using multiprocessing. AFAIK when I start a new process using this module (multiprocessing) I should do "os.wait()" or "childProcess.join()" to get its' exit status. But placing above functions in my program can occur in stopping main process if something happens to child process (and the child process will hang).
The problem is that if I don't do that I'll get child processes go zombie (and will be listed as something like "python < defunct>" in top listing).
Is there any way to avoid waiting for child processes to end and to avoid creating zombie processes and\or not bothering the main process so much about it's child processes?
Though ars' answer should solve your immediate issues, you might consider looking at celery: http://ask.github.com/celery/index.html. It's a relatively developer-friendly approach to accomplishing these goals and more.
You may have to provide more information or actual code to figure this out. Have you been through the documentation, in particular the sections labeled "Warning"? For example, you may be facing something like this:
Warning: As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread()), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
Note that a queue created using a manager does not have this issue. See Programming guidelines.
Is there a way to stop the multiprocessing Python module from trying to call & wait on join() on child processes of a parent process shutting down?
2010-02-18 10:58:34,750 INFO calling join() for process procRx1
I want the process to which I sent a SIGTERM to exit as quickly as possible (i.e. "fail fast") instead of waiting for several seconds before finally giving up on the join attempt.
Clarifications: I have a "central process" which creates a bunch of "child processes". I am looking for a way to cleanly process a "SIGTERM" signal from any process in order to bring down the whole process tree.
Have you tried to explicitly using Process.terminate?
You could try joining in a loop with a timeout (1 sec?) and checking if the thread is still alive, something like:
while True:
a_thread.join(1)
if not a_thread.isAlive(): break
Terminating the a_thread will trigger break clause.
Sounds like setting your subprocess' flag Process.daemon = False may be what you want:
Process.daemon:
The process’s daemon flag, a Boolean value. This must be set before start() is called.
The initial value is inherited from the creating process.
When a process exits, it attempts to terminate all of its daemonic child processes.
Note that a daemonic process is not allowed to create child processes. Otherwise a daemonic process would leave its children orphaned if it gets terminated when its parent process exits. Additionally, these are not Unix daemons or services, they are normal processes that will be terminated (and not joined) if non-dameonic processes have exited.
Is there a way for a child process in Python to detect if the parent process has died?
If your Python process is running under Linux, and the prctl() system call is exposed, you can use the answer here.
This can cause a signal to be sent to the child when the parent process dies.
Assuming the parent is alive when you start to do this, you can check whether it is still alive in a busy loop as such, by using psutil:
import psutil, os, time
me = psutil.Process(os.getpid())
while 1:
if me.parent is not None:
# still alive
time.sleep(0.1)
continue
else:
print "my parent is gone"
Not very nice but...
The only reliable way I know of is to create a pipe specifically for this purpose. The child will have to repeatedly attempt to read from the pipe, preferably in a non-blocking fashion, or using select. It will get an error when the pipe does not exist anymore (presumably because of the parent's death).
You might get away with reading your parent process' ID very early in your process, and then checking, but of course that is prone to race conditions. The parent that did the spawn might have died immediately, and even before your process got to execute its first instruction.
Unless you have a way of verifying if a given PID refers to the "expected" parent, I think it's hard to do reliably.