In python, I have a parent process that spawns a handful of child processes. I've run into a situation where, due to an unhandled exception, the parent process was dieing and the child processes where left orphaned. How do I get the child processes to recognize that they've lost their parent?
I tried some code that hooks the child process up to every available signal and none of them were fired. I could theoretically put a giant try/except around the parent process to ensure that it at least fires a sigterm to the children, but this is inelegant and not foolproof. How can I prevent orphaned processes?
on UNIX (including Linux):
def is_parent_running():
try:
os.kill(os.getppid(), 0)
return True
except OSError:
return False
Note, that on UNIX, signal 0 is not a real signal. It is used just to test if given process exists. See manual for kill command.
You can use socketpair() to create a pair of unix domain sockets before creating the subprocess. Have the parent have one end open, and the child the other end open. When the parent exits, it's end of the socket will shut down. Then the child will know it exited because it can select()/poll() for read events from its socket and receive end of file at that time.
Related
I am using a Python process to run one of my functions like so:
Process1 = Process(target = someFunction)
Process1.start()
Now that function has no looping or anything, it just does its thing, then ends, does the Process die with it? or do I always need to drop a:
Process1.terminate()
Afterwards?
The child process will exit by itself - the Process1.terminate() is unnecessary in that regard. This is especially true if using any shared resources between the child and parent process. From the Python documentation:
Avoid terminating processes
Using the Process.terminate method to stop a process is liable to cause any shared resources (such as locks, semaphores, pipes and queues) currently being used by the process to become broken or unavailable to other processes.
Therefore it is probably best to only consider using Process.terminate on processes which never use any shared resources.
However, if you want the parent process to wait for the child process to finish (perhaps the child process is modifying something that the parent will access afterwards), then you'll want to use Process1.join() to block the parent process from continuing until the child process complete. This is generally good practice when using child processes to avoid zombie processes or orphaned children.
No, as per the documentation it only sends a SIGTERM or TerminateProcess() to the process in question. If it has already exited then there is nothing to terminate.
However, it is always a good process to use exit codes in your subprocesses:
import sys
sys.exit(1)
And then check the exit code once you know the process has terminated:
if Process1.exitcode():
errorHandle()
I have a subprocess via multiprocessing.Process and a queue via multiprocessing.Queue.
The main process is using multiprocessing.Queue.get() to get some new data. I don't want to have a timeout there and I want it to be blocking.
However, when the child process dies for whatever reason (manually killed by user via kill, or segfault, etc.), Queue.get() just will hang forever.
How can I avoid that?
I think multiprocessing.Queue is not what I want.
I'm using now
parent_conn, child_conn = multiprocessing.Pipe(duplex=True)
to get two multiprocessing.Connection objects. Then I os.fork() or use multiprocessing.Process. In the child, I do:
parent_conn.close()
# read/write on child_conn
In the parent (after the fork), I do:
child_conn.close()
# read/write on parent_conn
That way, when I call recv() on the connection, it will raise an exception (EOFError) when the child/parent dies in the meanwhile.
Note that this works only for a single child. I guess Queue is meant when you want multiple childs. In that case, you would probably anyway have some manager which watches whether all childs are alive and restarts them accordingly.
The Queue has no way of knowing when it does not have any possible writers anymore. You could pass the object to any number of subprocesses, and it does not know if you passed it to any given subprocess. So it will have to wait, even if a subprocess dies. A queue is not a file descriptor that is automatically closed when the child dies.
What you are looking for is some kind of supervisor in the parent process that notices when children die unexpectedly and handle that situation in whatever way you think appropriate. You can do this by catching a SIGCHLD process, checking Process.is_alive or using Process.join in a thread. A simple implementation would use the timeout parameter in the Queue.get call and do a Process.is_alive check when that returns.
If you have a bit more control over the death of the child process, it should send an "EOF"-type object (None, or some kind of marker that it is done) to the queue so your parent process can handle it correctly.
I have a Python 2.7 multiprocessing Process which will not exit on parent process exit. I've set the daemon flag which should force it to exit on parent death. The docs state that:
"When a process exits, it attempts to terminate all of its daemonic child processes."
p = Process(target=_serverLaunchHelper, args=args)
p.daemon = True
print p.daemon # prints True
p.start()
When I terminate the parent process via a kill command the daemon is left alive and running (which blocks the port on the next run). The child process is starting a SimpleHttpServer and calling serve_forever without doing anything else. My guess is that the "attempts" part of the docs means that the blocking server process is stopping process death and it's letting the process get orphaned as a result. I could have the child push the serving to another Thread and have the main thread check for parent process id changes, but this seems like a lot of code to just replicate the daemon functionality.
Does someone have insight into why the daemon flag isn't working as described? This is repeatable on windows8 64 bit and ubuntu12 32 bit vm.
A boiled down version of the process function is below:
def _serverLaunchHelper(port)
httpd = SocketServer.TCPServer(("", port), Handler)
httpd.serve_forever()
When a process exits, it attempts to terminate all of its daemonic child processes.
The key word here is "attempts". Also, "exits".
Depending on your platform and implementation, it may be that the only way to get daemonic child processes terminated is to do so explicitly. If the parent process exits normally, it gets a chance to do so explicitly, so everything is fine. But if the parent process is terminated abruptly, it doesn't.
For CPython in particular, if you look at the source, terminating daemonic processes is handled the same way as joining non-daemonic processes: by walking active_children() in an atexit function. So, your daemons will be killed if and only if your atexit handlers get to run. And, as that module's docs say:
Note: the functions registered via this module are not called when the program is killed by a signal not handled by Python, when a Python fatal internal error is detected, or when os._exit() is called.
Depending on how you're killing the parent, you might be able to work around this by adding a signal handler to intercept abrupt termination. But you might not—e.g., on POSIX, SIGKILL is not intercept able, so if you kill -9 $PARENTPID, this isn't an option.
Another option is to kill the process group, instead of just the parent process. For example, if your parent has PID 12345, kill -- -12345 on linux will kill it and all of its children (assuming you haven't done anything fancy).
Is there a way to stop the multiprocessing Python module from trying to call & wait on join() on child processes of a parent process shutting down?
2010-02-18 10:58:34,750 INFO calling join() for process procRx1
I want the process to which I sent a SIGTERM to exit as quickly as possible (i.e. "fail fast") instead of waiting for several seconds before finally giving up on the join attempt.
Clarifications: I have a "central process" which creates a bunch of "child processes". I am looking for a way to cleanly process a "SIGTERM" signal from any process in order to bring down the whole process tree.
Have you tried to explicitly using Process.terminate?
You could try joining in a loop with a timeout (1 sec?) and checking if the thread is still alive, something like:
while True:
a_thread.join(1)
if not a_thread.isAlive(): break
Terminating the a_thread will trigger break clause.
Sounds like setting your subprocess' flag Process.daemon = False may be what you want:
Process.daemon:
The process’s daemon flag, a Boolean value. This must be set before start() is called.
The initial value is inherited from the creating process.
When a process exits, it attempts to terminate all of its daemonic child processes.
Note that a daemonic process is not allowed to create child processes. Otherwise a daemonic process would leave its children orphaned if it gets terminated when its parent process exits. Additionally, these are not Unix daemons or services, they are normal processes that will be terminated (and not joined) if non-dameonic processes have exited.
Is there a way for a child process in Python to detect if the parent process has died?
If your Python process is running under Linux, and the prctl() system call is exposed, you can use the answer here.
This can cause a signal to be sent to the child when the parent process dies.
Assuming the parent is alive when you start to do this, you can check whether it is still alive in a busy loop as such, by using psutil:
import psutil, os, time
me = psutil.Process(os.getpid())
while 1:
if me.parent is not None:
# still alive
time.sleep(0.1)
continue
else:
print "my parent is gone"
Not very nice but...
The only reliable way I know of is to create a pipe specifically for this purpose. The child will have to repeatedly attempt to read from the pipe, preferably in a non-blocking fashion, or using select. It will get an error when the pipe does not exist anymore (presumably because of the parent's death).
You might get away with reading your parent process' ID very early in your process, and then checking, but of course that is prone to race conditions. The parent that did the spawn might have died immediately, and even before your process got to execute its first instruction.
Unless you have a way of verifying if a given PID refers to the "expected" parent, I think it's hard to do reliably.