Python subprocess.communicate hangs when parent leaves zombies - python

I'm trying to use Popen to create a subprocess A along with a thread that communicates with it using Popen.communicate. The main process will wait on the thread using Thread.join with a specified timeout, and kills A after that timeout expires, which should cause the thread to die as well.
However, this doesn't seem to work when A itself spawns more subprocesses B,C and D with different process groups than A that refuse to die. Even after A is dead and labelled defunct, and even after the main process reaps A using os.waitpid() so that it no longer exists, the the thread refuses to join with the main thread.
Only after all the children, B, C, D are killed, does Popen.communicate finally return.
Is this behavior actually expected from the module? A recursive wait might be useful in some cases, but it's certainly not appropriate as the default behavior for Popen.communicate. And if this is the intended behavior, is there any way to override it?
Here's a very simple example:
from subprocess import PIPE, Popen
from threading import Thread
import os
import time
import signal
DEVNULL = open(os.devnull, 'w')
proc = Popen(["/bin/bash"], stdin=PIPE, stdout=PIPE,
stderr=DEVNULL, start_new_session=True)
def thread_function():
print("Entering thread")
return proc.communicate(input=b"nohup sleep 100 &\nexit\n")
thread = Thread(target=thread_function)
thread.start()
time.sleep(1)
proc.kill()
while True:
thread.join(timeout=5)
if not thread.is_alive():
break
print("Thread still alive")
This is on Linux.

I think this comes from a fairly natural way to write the popen.communicate method in Linux. Proc.communicate() appears to read the stdin file descriptor, which will return an EOF when the process dies. Then it does the wait to get the exit code of the process.
In your example, the sleep process inherits the stdin file descriptor from the bash process. So when the bash process dies, popen.communicate doesn't get an EOF on the stdin pipe, as the sleep still has it open. The simplest way to fix this is to change the communicate line to:
return proc.communicate(input=b"nohup sleep 100 >/dev/null&\nexit\n")
This causes your thread to end as soon the bash dies... due to the exit, not your proc.kill, in this case. However, the sleep is still running after bash dies if you use the exit statement or the proc.kill call. If you want to kill the sleep as well, I would use
os.killpg(proc.pid,15)
instead of the proc.kill(). The more general problem of killing B, C and D if they change the group is a more complex problem.
Addtional data:
I couldn't find any official documentation for this method of proc.communicate, but I forgot the most obvious place :-) I found it with the help of this answer. The docs for communicate say:
Interact with process: Send data to stdin. Read data from stdout and stderr, until end-of-file is reached. Wait for process to terminate.
You are getting stuck at step 2: Read until end-of-file, because the sleep is keeping the pipe open.

Related

Python subprocess polling not giving return code when used with Java process

I'm having a problem with subprocess poll not returning the return code when the process has finished.
I found out how to set a timeout on subprocess.Popen and used that as the basis for my code. However, I have a call that uses Java that doesn't correctly report the return code so each call "times out" even though it is actually finished. I know the process has finished because when removing the poll timeout check, the call runs without issue returning a good exit code and within the time limit.
Here is the code I am testing with.
import subprocess
import time
def execute(command):
print('start command: {}'.format(command))
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print('wait')
wait = 10
while process.poll() is None and wait > 0:
time.sleep(1)
wait -= 1
print('done')
if wait == 0:
print('terminate')
process.terminate()
print('communicate')
stdout, stderr = process.communicate()
print('rc')
exit_code = process.returncode
if exit_code != 0:
print('got bad rc')
if __name__ == '__main__':
execute(['ping','-n','15','127.0.0.1']) # correctly times out
execute(['ping','-n','5','127.0.0.1']) # correctly runs within the time limit
# incorrectly times out
execute(['C:\\dev\\jdk8\\bin\\java.exe', '-jar', 'JMXQuery-0.1.8.jar', '-url', 'service:jmx:rmi:///jndi/rmi://localhost:18080/jmxrmi', '-json', '-q', 'java.lang:type=Runtime;java.lang:type=OperatingSystem'])
You can see that two examples are designed to time out and two are not to time out and they all work correctly. However, the final one (using jmxquery to get tomcat metrics) doesn't return the exit code and therefore "times out" and has to be terminated, which then causes it to return an error code of 1.
Is there something I am missing in the way subprocess poll is interacting with this Java process that is causing it to not return an exit code? Is there a way to get a timeout option to work with this?
This has the same cause as a number of existing questions, but the desire to impose a timeout requires a different answer.
The OS deliberately gives only a small amount of buffer space to each pipe. When a process writes to one that is full (because the reader has not yet consumed the previous output), it blocks. (The reason is that a producer that is faster than its consumer would otherwise be able to quickly use a great deal of memory for no gain.) Therefore, if you want to do more than one of the following with a subprocess, you have to interleave them rather than doing each in turn:
Read from standard output
Read from standard error (unless it’s merged via subprocess.STDOUT)
Wait for the process to exit, or for a timeout to elapse
Of course, the subprocess might close its streams before it exits, write useful output after you notice the timeout and before you kill it, and/or start additional processes that keep the pipe open indefinitely, so you might want to have multiple timeouts. Probably what’s most informative is the EOF on the pipe, so repeatedly use something like select to wait for (however much is left of) the timeout, issue single reads on the streams that are ready, and wait (with another timeout if you’re concerned about hangs after an early stream closure) on EOF. If the timeout occurs instead, (try to) kill the subprocess, and consider issuing non-blocking reads (or another timeout loop) to get any last available output before closing the pipes.
Using the other answer by #DavisHerring as the basis for more research, I came across a concept that worked for my original case. Here is the code that came out of that.
import subprocess
import threading
import time
def execute(command):
print('start command: {}'.format(command))
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
timer = threading.Timer(10, terminate_process, [process])
timer.start()
print('communicate')
stdout, stderr = process.communicate()
print('rc')
exit_code = process.returncode
timer.cancel()
if exit_code != 0:
print('got bad rc')
def terminate_process(p):
try:
p.terminate()
except OSError:
pass # ignore error
It uses the threading.Timer to make sure that the process doesn't go over the time limit and terminates the process if it does. It otherwise waits for a response back and cancels the timer once it finishes.

not able to terminate the process in multiprocessing python (linux)

I am new to python and using multiprocessing, I am starting one process and calling one shell script through this process. After terminating this process shell script keeps running in the background, how do I kill it, please help.
python script(test.py)
#!/usr/bin/python
import time
import os
import sys
import multiprocessing
# test process
def test_py_process():
os.system("./test.sh")
return
p=multiprocessing.Process(target=test_py_process)
p.start()
print 'STARTED:', p, p.is_alive()
time.sleep(10)
p.terminate()
print 'TERMINATED:', p, p.is_alive()
shell script (test.sh)
#!/bin/bash
for i in {1..100}
do
sleep 1
echo "Welcome $i times"
done
The reason is that the child process that is spawned by the os.system call spawns a child process itself. As explained in the multiprocessing docs descendant processes of the process will not be terminated – they will simply become orphaned. So. p.terminate() kills the process you created, but the OS process (/bin/bash ./test.sh) simply gets assigned to the system's scheduler process and continues executing.
You could use subprocess.Popen instead:
import time
from subprocess import Popen
if __name__ == '__main__':
p = Popen("./test.sh")
print 'STARTED:', p, p.poll()
time.sleep(10)
p.kill()
print 'TERMINATED:', p, p.poll()
Edit: #Florian Brucker beat me to it. He deserves the credit for answering the question first. Still keeping this answer for the alternate approach using subprocess, which is recommended over os.system() in the documentation for os.system() itself.
os.system runs the given command in a separate process. Therefore, you have three processes:
The main process in which your script runs
The process in which test_py_processes runs
The process in which the bash script runs
Process 2 is a child process of process 1, and process 3 is a child of process 1.
When you call Process.terminate from within process 1 this will send the SIGTERM signal to process two. That process will then terminate. However, the SIGTERM signal is not automatically propagated to the child processes of process 2! This means that process 3 is not notified when process 2 exits and hence keeps on running as a child of the init process.
The best way to terminate process 3 depends on your actual problem setting, see this SO thread for some suggestions.

How to kill a subprocess started in a thread?

I am trying to run the Robocopy command (but I am curious about any subprocess) from Python in windows. The code is pretty simple and works well. It is:
def copy():
with Popen(['Robocopy', media_path, destination_path, '/E', '/mir', '/TEE', '/log+:' + log_path], stdout=PIPE, bufsize=1, universal_newlines=True) as Robocopy:
Robocopy.wait()
returncode = Robocopy.returncode
Additionally I am running it in a separate thread with the following:
threading.Thread(target=copy, args=(media_path, destination_path, log_path,), daemon=True)
However, there are certain instances where I want to stop the robocopy (akin to closing the CMD window if it was run from the command line)
Is there a good way to do this in Python?
We fought with reliably killing subprocesses on Windows for a while and eventually came across this:
https://github.com/andreisavu/python-process/blob/master/killableprocess.py
It implements a kill() method for killing your subprocess. We've had really good results with it.
You will need to somehow pass the process object out of the thread and call kill() from another thread, or poll in your thread with wait() using a timeout while monitoring some kind of global-ish flag.
If the process doesn't start other processes then process.kill() should work:
import subprocess
class InterruptableProcess:
def __init__(self, *args):
self._process = subprocess.Popen(args)
def interrupt(self):
self._process.kill()
I don't see why would you need it on Windows but you could run Thread(target=self._process.wait, daemon=True).start() if you'd like.
If there is a possibility that the process may start other processes in turn then you might need a Job object to kill all the descendant processes. It seems killableprocess.py which is suggested by #rrauenza uses this approach (I haven't tested it). See Python: how to kill child process(es) when parent dies?.

Python avoid orphan processes

I'm using python to benchmark something. This can take a large amount of time, and I want to set a (global) timeout. I use the following script (summarized):
class TimeoutException(Exception):
pass
def timeout_handler(signum, frame):
raise TimeoutException()
# Halt problem after half an hour
signal.alarm(1800)
try:
while solution is None:
guess = guess()
try:
with open(solutionfname, 'wb') as solutionf:
solverprocess = subprocess.Popen(["solver", problemfname], stdout=solutionf)
solverprocess.wait()
finally:
# `solverprocess.poll() == None` instead of try didn't work either
try:
solverprocess.kill()
except:
# Solver process was already dead
pass
except TimeoutException:
pass
# Cancel alarm if it's still active
signal.alarm(0)
However it keeps spawning orphan processes sometimes, but I can't reliably recreate the circumstances. Does anyone know what the correct way to prevent this is?
You simply have to wait after killing the process.
The documentation for the kill() method states:
Kills the child. On Posix OSs the function sends SIGKILL to the child.
On Windows kill() is an alias for terminate().
In other words, if you aren't on Windows, you are only sending a signal to the subprocess.
This will create a zombie process because the parent process didn't read the return value of the subprocess.
The kill() and terminate() methods are just shortcuts to send_signal(SIGKILL) and send_signal(SIGTERM).
Try adding a call to wait() after the kill(). This is even shown in the example under the documentation for communicate():
proc = subprocess.Popen(...)
try:
outs, errs = proc.communicate(timeout=15)
except TimeoutExpired:
proc.kill()
outs, errs = proc.communicate()
note the call to communicate() after the kill(). (It is equivalent to calling wait() and also erading the outputs of the subprocess).
I want to clarify one thing: it seems like you don't understand exactly what a zombie process is. A zombie process is a terminated process. The kernel keeps the process in the process table until the parent process reads its exit status. I believe all memory used by the subprocess is actually reused; the kernel only has to keep track of the exit status of such a process.
So, the zombie processes you see aren't running. They are already completely dead, and that's why they are called zombie. They are "alive" in the process table, but aren't really running at all.
Calling wait() does exactly this: wait till the subprocess ends and read the exit status. This allows the kernel to remove the subprocess from the process table.
On linux, you can use python-prctl.
Define a preexec function such as:
def pre_exec():
import signal
prctl.set_pdeathsig(signal.SIGTERM)
And have your Popen call pass it.
subprocess.Popen(..., preexec_fn=pre_exec)
That's as simple as that. Now the child process will die rather than become orphan if the parent dies.
If you don't like the external dependency of python-prctl you can also use the older prctl. Instead of
prctl.set_pdeathsig(signal.SIGTERM)
you would have
prctl.prctl(prctl.PDEATHSIG, signal.SIGTERM)

How do I run a sub-process, display its output in a GUI and allow it to be terminated?

I have been trying to write an application that runs subprocesses and (among other things) displays their output in a GUI and allows the user to click a button to cancel them. I start the processes like this:
queue = Queue.Queue(500)
process = subprocess.Popen(
command,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
iothread = threading.Thread(
target=simple_io_thread,
args=(process.stdout, queue))
iothread.daemon=True
iothread.start()
where simple_io_thread is defined as follows:
def simple_io_thread(pipe, queue):
while True:
line = pipe.readline()
queue.put(line, block=True)
if line=="":
break
This works well enough. In my UI I periodically do non-blocking "get"s from the queue. However, my problems come when I want to terminate the subprocess. (The subprocess is an arbitrary process, not something I wrote myself.) I can use the terminate method to terminate the process, but I do not know how to guarantee that my I/O thread will terminate. It will normally be doing blocking I/O on the pipe. This may or may not end some time after I terminate the process. (If the subprocess has spawned another subprocess, I can kill the first subprocess, but the second one will still keep the pipe open. I'm not even sure how to get such grand-children to terminate cleanly.) After that the I/O thread will try to enqueue the output, but I don't want to commit to reading from the queue indefinitely.
Ideally I would like some way to request termination of the subprocess, block for a short (<0.5s) amount of time and after that be guaranteed that the I/O thread has exited (or will exit in a timely fashion without interfering with anything else) and that I can stop reading from the queue.
It's not critical to me that a solution uses an I/O thread. If there's another way to do this that works on Windows and Linux with Python 2.6 and a Tkinter GUI that would be fine.
EDIT - Will's answer and other things I've seen on the web about doing this in other languages suggest that the operating system expects you just to close the file handle on the main thread and then the I/O thread should come out of its blocking read. However, as I described in the comment, that doesn't seem to work for me. If I do this on the main thread:
process.stdout.close()
I get:
IOError: close() called during concurrent operation on the same file object.
...on the main thread. If I do this on the main thread:
os.close(process.stdout.fileno())
I get:
close failed in file object destructor: IOError: [Errno 9] Bad file descriptor
...later on in the main thread when it tries to close the file handle itself.
I know this is an old post, but in case it still helps anyone, I think your problem could be solved by passing the subprocess.Popen instance to io_thread, rather than it's output stream.
If you do that, then you can replace your while True: line with while process.poll() == None:.
process.poll() checks for the subprocess return code; if the process hasn't finished, then there isn't one (i.e. process.poll() == None). You can then do away with if line == "": break.
The reason I'm here is because I wrote a very similar script to this today, and I got those:-
IOError: close() called during concurrent operation on the same file object. errors.
Again, in case it helps, I think my problems stem from (my) io_thread doing some overly efficient garbage collection, and closes a file handle I give it (I'm probably wrong, but it works now..) Mine's different tho in that it's not daemonic, and it iterates through subprocess.stdout, rather than using a while loop.. i.e.:-
def io_thread(subprocess,logfile,lock):
for line in subprocess.stdout:
lock.acquire()
print line,
lock.release()
logfile.write( line )
I should also probably mention that I pass the bufsize argument to subprocess.Popen, so that it's line buffered.
This is probably old enough, but still usefull to someone coming from search engine...
The reason that it shows that message is that after the subprocess has been completed it closes the file descriptors, therefore, the daemon thread (which is running concurrently) will try to use those closed descriptors raising the error.
By joining the thread before the subprocess wait() or communicate() methods should be more than enough to suppress the error.
my_thread.join()
print my_thread.is_alive()
my_popen.communicate()
In the code that terminates the process, you could also explicitly os.close() the pipe that your thread is reading from?
You should close the write pipe instead... but as you wrote the code you cannot access to it. To do it you should
crate a pipe
pass the write pipe file id to Popen's stdout
use the read pipe file simple_io_thread to read lines.
Now you can close the write pipe and the read thread will close gracefully.
queue = Queue.Queue(500)
r, w = os.pipe()
process = subprocess.Popen(
command,
stdout=w,
stderr=subprocess.STDOUT)
iothread = threading.Thread(
target=simple_io_thread,
args=(os.fdopen(r), queue))
iothread.daemon=True
iothread.start()
Now by
os.close(w)
You can close the pipe and iothread will shutdown without any exception.

Categories

Resources