Python avoid orphan processes

Python avoid orphan processes - python

I'm using python to benchmark something. This can take a large amount of time, and I want to set a (global) timeout. I use the following script (summarized):
class TimeoutException(Exception):
pass
def timeout_handler(signum, frame):
raise TimeoutException()
# Halt problem after half an hour
signal.alarm(1800)
try:
while solution is None:
guess = guess()
try:
with open(solutionfname, 'wb') as solutionf:
solverprocess = subprocess.Popen(["solver", problemfname], stdout=solutionf)
solverprocess.wait()
finally:
# `solverprocess.poll() == None` instead of try didn't work either
try:
solverprocess.kill()
except:
# Solver process was already dead
pass
except TimeoutException:
pass
# Cancel alarm if it's still active
signal.alarm(0)
However it keeps spawning orphan processes sometimes, but I can't reliably recreate the circumstances. Does anyone know what the correct way to prevent this is?

You simply have to wait after killing the process.

The documentation for the kill() method states:
Kills the child. On Posix OSs the function sends SIGKILL to the child.
On Windows kill() is an alias for terminate().
In other words, if you aren't on Windows, you are only sending a signal to the subprocess.
This will create a zombie process because the parent process didn't read the return value of the subprocess.
The kill() and terminate() methods are just shortcuts to send_signal(SIGKILL) and send_signal(SIGTERM).
Try adding a call to wait() after the kill(). This is even shown in the example under the documentation for communicate():
proc = subprocess.Popen(...)
try:
outs, errs = proc.communicate(timeout=15)
except TimeoutExpired:
proc.kill()
outs, errs = proc.communicate()
note the call to communicate() after the kill(). (It is equivalent to calling wait() and also erading the outputs of the subprocess).
I want to clarify one thing: it seems like you don't understand exactly what a zombie process is. A zombie process is a terminated process. The kernel keeps the process in the process table until the parent process reads its exit status. I believe all memory used by the subprocess is actually reused; the kernel only has to keep track of the exit status of such a process.
So, the zombie processes you see aren't running. They are already completely dead, and that's why they are called zombie. They are "alive" in the process table, but aren't really running at all.
Calling wait() does exactly this: wait till the subprocess ends and read the exit status. This allows the kernel to remove the subprocess from the process table.

On linux, you can use python-prctl.
Define a preexec function such as:
def pre_exec():
import signal
prctl.set_pdeathsig(signal.SIGTERM)
And have your Popen call pass it.
subprocess.Popen(..., preexec_fn=pre_exec)
That's as simple as that. Now the child process will die rather than become orphan if the parent dies.
If you don't like the external dependency of python-prctl you can also use the older prctl. Instead of
prctl.set_pdeathsig(signal.SIGTERM)
you would have
prctl.prctl(prctl.PDEATHSIG, signal.SIGTERM)

Related

Python subprocess.communicate hangs when parent leaves zombies

I'm trying to use Popen to create a subprocess A along with a thread that communicates with it using Popen.communicate. The main process will wait on the thread using Thread.join with a specified timeout, and kills A after that timeout expires, which should cause the thread to die as well.
However, this doesn't seem to work when A itself spawns more subprocesses B,C and D with different process groups than A that refuse to die. Even after A is dead and labelled defunct, and even after the main process reaps A using os.waitpid() so that it no longer exists, the the thread refuses to join with the main thread.
Only after all the children, B, C, D are killed, does Popen.communicate finally return.
Is this behavior actually expected from the module? A recursive wait might be useful in some cases, but it's certainly not appropriate as the default behavior for Popen.communicate. And if this is the intended behavior, is there any way to override it?
Here's a very simple example:
from subprocess import PIPE, Popen
from threading import Thread
import os
import time
import signal
DEVNULL = open(os.devnull, 'w')
proc = Popen(["/bin/bash"], stdin=PIPE, stdout=PIPE,
stderr=DEVNULL, start_new_session=True)
def thread_function():
print("Entering thread")
return proc.communicate(input=b"nohup sleep 100 &\nexit\n")
thread = Thread(target=thread_function)
thread.start()
time.sleep(1)
proc.kill()
while True:
thread.join(timeout=5)
if not thread.is_alive():
break
print("Thread still alive")
This is on Linux.

I think this comes from a fairly natural way to write the popen.communicate method in Linux. Proc.communicate() appears to read the stdin file descriptor, which will return an EOF when the process dies. Then it does the wait to get the exit code of the process.
In your example, the sleep process inherits the stdin file descriptor from the bash process. So when the bash process dies, popen.communicate doesn't get an EOF on the stdin pipe, as the sleep still has it open. The simplest way to fix this is to change the communicate line to:
return proc.communicate(input=b"nohup sleep 100 >/dev/null&\nexit\n")
This causes your thread to end as soon the bash dies... due to the exit, not your proc.kill, in this case. However, the sleep is still running after bash dies if you use the exit statement or the proc.kill call. If you want to kill the sleep as well, I would use
os.killpg(proc.pid,15)
instead of the proc.kill(). The more general problem of killing B, C and D if they change the group is a more complex problem.
Addtional data:
I couldn't find any official documentation for this method of proc.communicate, but I forgot the most obvious place :-) I found it with the help of this answer. The docs for communicate say:
Interact with process: Send data to stdin. Read data from stdout and stderr, until end-of-file is reached. Wait for process to terminate.
You are getting stuck at step 2: Read until end-of-file, because the sleep is keeping the pipe open.

Forking and exiting from child in python

I'm trying to fork a process, do something in the child and then exit from it (see code below). To exit I first tried sys.exit which turned out to be a problem because an intermediate function caught the SystemExit exception (as in the code below) and so the child didn't actually terminate. I figured out that I should use os._exit instead. Now the child terminates, but I still see defunct processes lying around (when I do ps -ef). Is there a way to avoid these?
import os, sys
def fctn():
if os.fork() != 0:
return 0
# sys.exit(0)
os._exit(0)
while True:
str = raw_input()
try:
print(fctn())
except SystemExit:
print('Caught SystemExit.')
Edit: this was actually not really a Python question but more of a Unix question (so I guess results may vary depending on the system). Ivan's answer suggests that I should do something like
def handleSIGCHLD(sig, frame):
os.wait()
signal.signal(signal.SIGCHLD, handleSIGCHLD)
while for me a simple
signal.signal(signal.SIGCHLD, signal.SIG_IGN)
also works.
And then it's probably true that I should use some library...

You should wait() for a child to remove its zombie process entry from the table.
Finally, to offload tasks to children, you may be better off with multiprocessing.

you are better off using the subprocess module for your need. Its the preferred way of forking off a process.
https://docs.python.org/2/library/subprocess.html#subprocess.check_call

Is there a way to make os.killpg not kill the script that calls it?

I have a subprocess which I open, which calls other processes.
I use os.killpg(os.getpgid(subOut.pid), signal.SIGTERM) to kill the entire group, but this kills the python script as well. Even when I call a python script with os.killpg from a second python script, this kills the second script as well. Is there a way to make os.killpg not stop the script?
Another solution would be to individually kill every child 1process. However, even using
p = psutil.Process(subOut.pid)
child_pid = p.children(recursive=True)
for pid in child_pid:
os.kill(pid.pid, signal.SIGTERM)
does not correctly give me all the pids of the children.
And you know what they say... don't kill the script that calls you...

A bit late to answer, but since google took me here while looking for a related problem: the reason your script gets killed is because its children will, by default, inherit its group id. But you can tell subprocess.Popen to create a new process group for your subprocess. Though it's a bit tricky: you have to pass in os.setpgrp for the preexec_fn parameter. This will call setpgrp (without any arguments) in the newly created (forked) process (before that does the exec) which will set the gid of the new process to the pid of the new process (thus creating a new group). The documentation mentions that it can deadlock in multi-threaded code. As an alternative, you can use start_new_session=True, but that would create not only a new process group but a new session. (And that would mean that if you close your terminal session while your script is running, the children would not be terminated. It may or may not be a problem.)
As a side note, if you are on windows, you can simply pass subprocess.CREATE_NEW_PROCESS_GROUP in the creationflag parameter.
Here is what it looks like in detail:
subOut = subprocess.Popen(['your', 'subprocess', ...], preexec_fn=os.setpgrp)
# when it's time to kill
os.killpg(os.getpgid(subOut.pid), signal.SIGTERM)

Create a process group having all the immediate children of the called process as follows:
p1 = subprocess.Popen(cmd1)
os.setpgrp(p1.pid, 0) #It will create process group with id same as p1.pid
p2 = subprocess.Popen(cmd2)
os.setpgrp(p2.pid, os.getpgid(p1.pid))
pn = subprocess.Popen(cmdn)
os.setpgrp(pn.pid, os.getpgid(p1.pid))
#Kill all the children and their process tree using following command
os.killpg(os.getpgid(p1.pid), signal.SIGKILL)
It will kill whole process tree except its own process.

atleta's answer above worked for me but the preexec_fn argument in the call to Popen should be setpgrp, rather than setgrp:
subOut = subprocess.Popen(['your', 'subprocess', ...], preexec_fn=os.setpgrp)
I'm posting this as an answer instead of a comment on atleta's answer because I don't have comment privileges yet.

Easy way is to set the parent process to ignore the signal before sending it.
# Tell this (parent) process to ignore the signal
old_handler = signal.signal(sig, signal.SIG_IGN)
# Send the signal to our process group and
# wait for them all to exit.
os.killpg(os.getpgid(0), sig)
while os.wait() != -1:
pass
# Restore the handler
signal.signal(sig, old_handler)

Using subprocess wait() and poll()

I am trying to write a small app that uses the subprocess module.
My program calls an external Bash command that takes some time to process. During this time, I would like to show the user a series of messages like this:
Processing. Please wait...
The output is foo()
How can I do this using Popen.wait() or Popen.poll(). I have read that I need to use the Popen.returncode, but how I can get it to actively check the state, I don't know.

Both wait() (with timeout specified) and poll() return None if the process has not yet finished, and something different if the process has finished (I think an integer, the exit code, hopefully 0).
Edit:
wait() and poll() have different behaviors:
wait (without the timeout argument) will block and wait for the process to complete.
wait with the timeout argument will wait timeout seconds for the process to complete. If it doesn't complete, it will throw the TimeoutExpired exception. If you catch the exception, you're then welcome to go on, or to wait again.
poll always returns immediately. It effectively does a wait with a timeout of 0, catches any exception, and returns None if the process hasn't completed.
With either wait or poll, if the process has completed, the popen object's returncode will be set (otherwise it's None - you can check for that as easily as calling wait or poll), and the return value from the function will also be the process's return code.
</Edit>
So I think you should do something like:
while myprocess.poll() is None:
print("Still working...")
# sleep a while
Be aware that if the bash script creates a lot of output you must use communicate() or something similar to prevent stdout or stderr to become stuffed.

#extraneon's answer is a little backwards. Both wait() and poll() return the process's exit code if the process has finished. The poll() method will return None if the process is still running and the wait() method will block until the process exits:
Check out the following page: https://docs.python.org/3.4/library/subprocess.html#popen-objects
Popen.poll()
Check if child process has terminated. Set and return returncode attribute.
Popen.wait()
Wait for child process to terminate. Set and return returncode attribute.

How do I run a sub-process, display its output in a GUI and allow it to be terminated?

I have been trying to write an application that runs subprocesses and (among other things) displays their output in a GUI and allows the user to click a button to cancel them. I start the processes like this:
queue = Queue.Queue(500)
process = subprocess.Popen(
command,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
iothread = threading.Thread(
target=simple_io_thread,
args=(process.stdout, queue))
iothread.daemon=True
iothread.start()
where simple_io_thread is defined as follows:
def simple_io_thread(pipe, queue):
while True:
line = pipe.readline()
queue.put(line, block=True)
if line=="":
break
This works well enough. In my UI I periodically do non-blocking "get"s from the queue. However, my problems come when I want to terminate the subprocess. (The subprocess is an arbitrary process, not something I wrote myself.) I can use the terminate method to terminate the process, but I do not know how to guarantee that my I/O thread will terminate. It will normally be doing blocking I/O on the pipe. This may or may not end some time after I terminate the process. (If the subprocess has spawned another subprocess, I can kill the first subprocess, but the second one will still keep the pipe open. I'm not even sure how to get such grand-children to terminate cleanly.) After that the I/O thread will try to enqueue the output, but I don't want to commit to reading from the queue indefinitely.
Ideally I would like some way to request termination of the subprocess, block for a short (<0.5s) amount of time and after that be guaranteed that the I/O thread has exited (or will exit in a timely fashion without interfering with anything else) and that I can stop reading from the queue.
It's not critical to me that a solution uses an I/O thread. If there's another way to do this that works on Windows and Linux with Python 2.6 and a Tkinter GUI that would be fine.
EDIT - Will's answer and other things I've seen on the web about doing this in other languages suggest that the operating system expects you just to close the file handle on the main thread and then the I/O thread should come out of its blocking read. However, as I described in the comment, that doesn't seem to work for me. If I do this on the main thread:
process.stdout.close()
I get:
IOError: close() called during concurrent operation on the same file object.
...on the main thread. If I do this on the main thread:
os.close(process.stdout.fileno())
I get:
close failed in file object destructor: IOError: [Errno 9] Bad file descriptor
...later on in the main thread when it tries to close the file handle itself.

I know this is an old post, but in case it still helps anyone, I think your problem could be solved by passing the subprocess.Popen instance to io_thread, rather than it's output stream.
If you do that, then you can replace your while True: line with while process.poll() == None:.
process.poll() checks for the subprocess return code; if the process hasn't finished, then there isn't one (i.e. process.poll() == None). You can then do away with if line == "": break.
The reason I'm here is because I wrote a very similar script to this today, and I got those:-
IOError: close() called during concurrent operation on the same file object. errors.
Again, in case it helps, I think my problems stem from (my) io_thread doing some overly efficient garbage collection, and closes a file handle I give it (I'm probably wrong, but it works now..) Mine's different tho in that it's not daemonic, and it iterates through subprocess.stdout, rather than using a while loop.. i.e.:-
def io_thread(subprocess,logfile,lock):
for line in subprocess.stdout:
lock.acquire()
print line,
lock.release()
logfile.write( line )
I should also probably mention that I pass the bufsize argument to subprocess.Popen, so that it's line buffered.

This is probably old enough, but still usefull to someone coming from search engine...
The reason that it shows that message is that after the subprocess has been completed it closes the file descriptors, therefore, the daemon thread (which is running concurrently) will try to use those closed descriptors raising the error.
By joining the thread before the subprocess wait() or communicate() methods should be more than enough to suppress the error.
my_thread.join()
print my_thread.is_alive()
my_popen.communicate()

In the code that terminates the process, you could also explicitly os.close() the pipe that your thread is reading from?

You should close the write pipe instead... but as you wrote the code you cannot access to it. To do it you should
crate a pipe
pass the write pipe file id to Popen's stdout
use the read pipe file simple_io_thread to read lines.
Now you can close the write pipe and the read thread will close gracefully.
queue = Queue.Queue(500)
r, w = os.pipe()
process = subprocess.Popen(
command,
stdout=w,
stderr=subprocess.STDOUT)
iothread = threading.Thread(
target=simple_io_thread,
args=(os.fdopen(r), queue))
iothread.daemon=True
iothread.start()
Now by
os.close(w)
You can close the pipe and iothread will shutdown without any exception.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python avoid orphan processes - python

You simply have to wait after killing the process.

Related

Python subprocess.communicate hangs when parent leaves zombies

Forking and exiting from child in python

Is there a way to make os.killpg not kill the script that calls it?

Using subprocess wait() and poll()

How do I run a sub-process, display its output in a GUI and allow it to be terminated?

Categories

Resources