Why doesn't subprocess.Popen() give live feed? - python

I'm working on an automated framework for a bioinformatics tool. As most software that my program will use is written for Linux and not written in python, I use subprocess to invoke the processes.
The problem I have is that many steps in the pipeline takes very long time and I want to see the live output so I know that it's still working and has not hung or something. But I will also need to capture the output to log any unexpected errors after the process is done.
I found that subprocces.Popen() is what I need for this issue.
This is the code I use (found here: https://fabianlee.org/2019/09/15/python-getting-live-output-from-subprocess-using-poll/):
# invoke process
process = subprocess.Popen("./test.sh", shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
# print stdout while process is still working
while True:
output = process.stdout.readline()
if process.poll() is not None:
break
if output:
print("out:", output.strip())
rc = process.poll()
if rc == 0:
print("Process ended with rc:", rc, "output:", output)
else:
print("Process ended with rc:", rc, "error:", process.stderr.readline())
It works like a charm when I use this simple bash script as argument:
#!/bin/bash
for i in $(seq 1 5); do
echo "iteration" $i
sleep 1
done
which gives the output:
out: iteration 1
out: iteration 2
out: iteration 3
out: iteration 4
out: iteration 5
Process ended with rc: 0 output:
or this if i deliberately insert an error in the script, e.g.:
Process ended with rc: 2 error: ./test.sh: line 7: syntax error: unexpected end of file
Hovever, when I try it with (in this case picard ValidateSamFile) it does not give me any livefeed no matter what I have tried:
# invoke process
process = subprocess.Popen("picard ValidateSamFile -I dna_seq/aligned/2064-01/AHWM2NCCXY.RJ-1967-2064-01.6.bam", shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
# print stdout while process is still working
while True:
output = process.stdout.readline()
if process.poll() is not None:
break
if output:
print("out:", output.strip())
rc = process.poll()
if rc == 0:
print("Process ended with rc:", rc, "output:", output)
else:
print("Process ended with rc:", rc, "error:", process.stderr.readline())
I get this after the process is completed:
out: No errors found
Process ended with rc: 0 output:
Any ideas?

Related

Display process output incrementally using Python subprocess

I'm trying to run "docker-compose pull" from inside a Python automation script and to incrementally display the same output that Docker command would print if it was run directly from the shell. This command prints a line for each Docker image found in the system, incrementally updates each line with the Docker image's download progress (a percentage) and replaces this percentage with a "done" when the download has completed. I first tried getting the command output with subprocess.poll() and (blocking) readline() calls:
import shlex
import subprocess
def run(command, shell=False):
p = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=shell)
while True:
# print one output line
output_line = p.stdout.readline().decode('utf8')
error_output_line = p.stderr.readline().decode('utf8')
if output_line:
print(output_line.strip())
if error_output_line:
print(error_output_line.strip())
# check if process finished
return_code = p.poll()
if return_code is not None and output_line == '' and error_output_line == '':
break
if return_code > 0:
print("%s failed, error code %d" % (command, return_code))
run("docker-compose pull")
The code gets stuck in the first (blocking) readline() call. Then I tried to do the same without blocking:
import select
import shlex
import subprocess
import sys
import time
def run(command, shell=False):
p = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=shell)
io_poller = select.poll()
io_poller.register(p.stdout.fileno(), select.POLLIN)
io_poller.register(p.stderr.fileno(), select.POLLIN)
while True:
# poll IO for output
io_events_list = []
while not io_events_list:
time.sleep(1)
io_events_list = io_poller.poll(0)
# print new output
for event in io_events_list:
# must be tested because non-registered events (eg POLLHUP) can also be returned
if event[1] & select.POLLIN:
if event[0] == p.stdout.fileno():
output_str = p.stdout.read(1).decode('utf8')
print(output_str, end="")
if event[0] == p.stderr.fileno():
error_output_str = p.stderr.read(1).decode('utf8')
print(error_output_str, end="")
# check if process finished
# when subprocess finishes, iopoller.poll(0) returns a list with 2 select.POLLHUP events
# (one for stdout, one for stderr) and does not enter in the inner loop
return_code = p.poll()
if return_code is not None:
break
if return_code > 0:
print("%s failed, error code %d" % (command, return_code))
run("docker-compose pull")
This works, but only the final lines (with "done" at the end) are printed to the screen, when all Docker images downloads have been completed.
Both methods work fine with a command with simpler output such as "ls". Maybe the problem is related with how this Docker command prints incrementally to screen, overwriting already written lines ? Is there a safe way to incrementally show the exact output of a command in the command line when running it via a Python script?
EDIT: 2nd code block was corrected
Always openSTDIN as a pipe, and if you are not using it, close it immediately.
p.stdout.read() will block until the pipe is closed, so your polling code does nothing useful here. It needs modifications.
I suggest not to use shell=True
Instead of *.readline(), try with *.read(1) and wait for "\n"
Of course you can do what you want in Python, the question is how. Because, a child process might have different ideas about how its output should look like, that's when trouble starts. E.g. the process might want explicitly a terminal at the other end, not your process. Or a lot of such simple nonsense. Also, a buffering may also cause problems. You can try starting Python in unbuffered mode to check. (/usr/bin/python -U)
If nothing works, then use pexpect automation library instead of subprocess.
I have found a solution, based on the first code block of my question:
def run(command,shell=False):
p = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=shell)
while True:
# read one char at a time
output_line = p.stderr.read(1).decode("utf8")
if output_line != "":
print(output_line,end="")
else:
# check if process finished
return_code = p.poll()
if return_code is not None:
if return_code > 0:
raise Exception("Command %s failed" % command)
break
return return_code
Notice that docker-compose uses stderr to print its progress instead of stdout. #Dalen has explained that some applications do it when they want that their results are pipeable somewhere, for instance a file, but also want to be able to show their progress.

Run Python script within Python by using `subprocess.Popen` in real time

I want to run a Python script (or any executable, for that manner) from a python script and get the output in real time. I have followed many tutorials, and my current code looks like this:
import subprocess
with open("test2", "w") as f:
f.write("""import time
print('start')
time.sleep(5)
print('done')""")
process = subprocess.Popen(['python3', "test2"], stdout=subprocess.PIPE)
while True:
output = process.stdout.readline()
if output == '' and process.poll() is not None:
break
if output:
print(output.strip())
rc = process.poll()
The first bit just creates the file that will be run, for clarity's sake.
I have two problems with this code:
It does not give the output in real time. It waits untill the process has finished.
It does not terminate the loop once the process has finished.
Any help would be very welcome.
EDIT: Thanks to #JohnAnderson for the fix to the first problem: replacing if output == '' and process.poll() is not None: with if output == b'' and process.poll() is not None:
Last night I've set out to do this using a pipe:
import os
import subprocess
with open("test2", "w") as f:
f.write("""import time
print('start')
time.sleep(2)
print('done')""")
(readend, writeend) = os.pipe()
p = subprocess.Popen(['python3', '-u', 'test2'], stdout=writeend, bufsize=0)
still_open = True
output = ""
output_buf = os.read(readend, 1).decode()
while output_buf:
print(output_buf, end="")
output += output_buf
if still_open and p.poll() is not None:
os.close(writeend)
still_open = False
output_buf = os.read(readend, 1).decode()
Forcing buffering out of the picture and reading one character at the time (to make sure we do not block writes from the process having filled a buffer), closing the writing end when process finishes to make sure read catches the EOF correctly. Having looked at the subprocess though that turned out to be a bit of an overkill. With PIPE you get most of that for free and I ended with this which seems to work fine (call read as many times as necessary to keep emptying the pipe) with just this and assuming the process finished, you do not have to worry about polling it and/or making sure the write end of the pipe is closed to correctly detect EOF and get out of the loop:
p = subprocess.Popen(['python3', '-u', 'test2'],
stdout=subprocess.PIPE, bufsize=1,
universal_newlines=True)
output = ""
output_buf = p.stdout.readline()
while output_buf:
print(output_buf, end="")
output += output_buf
output_buf = p.stdout.readline()
This is a bit less "real-time" as it is basically line buffered.
Note: I've added -u to you Python call, as you need to also make sure your called process' buffering does not get in the way.

Status of Batch Executed from within Python (Using Subprocess)

Similar to many other questions, I have a python script based on Windows that will try to execute 1 or more introductory sub-processes by calling batch files and opening them in new command prompt (shell) windows.
I want to wait for these batch files to finish processing and then call an action that will use the output of these introductory processes and continue the code execution.
Based on answers, I have tried the followings with no luck. It seems to me as soon as batch file is starting execution, the sub-process returns the status 0 and stops waiting/communicating! I have all the sample codes below as well as the output. I would appreciate if anyone have any hint/tip on how this can be done if it's doable on windows ?
Popen.wait(),
Popen.communicate(),
Popen.call(),
subprocess.getstatusoutput()
subprocess.check_call()
python file, start.py:
mycommand = "start test.bat"
process = subprocess.Popen(mycommand, shell=True)
#, stdout=logfile, universal_newlines=True)
if process.poll() == None:
print ("Pre Poll = None")
else:
print("Pre Poll = Value")
# process.wait()
process.communicate()
if process.poll() == None:
print ("Post Poll = None")
else:
print("Post Poll = Value")
print ("Exit of Loop: ", process.returncode)
Batch file, start.bat:
#echo off
echo Start of Loop
echo .
for /L %%n in (1,1,10000) do echo %%n
echo .
echo End of Loop
The output of Python is:
Pre Poll = None
Post Poll = Value
Exit of Loop: 0
while the batch file is still in the loop:

Capturing the output of a command in realtime - python

I see that there are several solutions for capturing a command output in realtime when invoked from python. I have a case like this.
run_command.py
import time
for i in range(10):
print "Count = ", i
time.sleep(1)
check_run_command.py - this one tries to capture the run_command.py output in realtime.
import subprocess
def run_command(cmd):
p = subprocess.Popen(
cmd,
shell=False,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
stdin=subprocess.PIPE
)
while True:
line = p.stdout.readline()
if line == '':
break
print(line.strip())
if __name__ == "__main__":
run_command("python run_command.py".split())
$ python check_run_command.py
(Waits 10 secs) then prints the following
Count = 0
Count = 1
....
Count = 9
I am not sure why I can't capture the output in realtime in this case. I tried multiple solutions in other threads for the same problem, but didn't help. Is the sleep in run_command.py has anything to do with this.
I tried running ls commands, but can't figure out if the output is printed in realtime or after the process completes, because the command itself completes quickly. Hence I added one that has sleep.

Python's subprocess.Popen object hangs gathering child output when child process does not exit

When a process exits abnormally or not at all, I still want to be able to gather what output it may have generated up until that point.
The obvious solution to this example code is to kill the child process with an os.kill, but in my real code, the child is hung waiting for NFS and does not respond to a SIGKILL.
#!/usr/bin/python
import subprocess
import os
import time
import signal
import sys
child_script = """
#!/bin/bash
i=0
while [ 1 ]; do
echo "output line $i"
i=$(expr $i \+ 1)
sleep 1
done
"""
childFile = open("/tmp/childProc.sh", 'w')
childFile.write(child_script)
childFile.close()
cmd = ["bash", "/tmp/childProc.sh"]
finish = time.time() + 3
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)
while p.poll() is None:
time.sleep(0.05)
if finish < time.time():
print "timed out and killed child, collecting what output exists so far"
out, err = p.communicate()
print "got it"
sys.exit(0)
In this case, the print statement about timing out appears and the python script never exits or progresses. Does anybody know how I can do this differently and still get output from my child processe
Problem is that bash doesn't answer to CTRL-C when not connected with a terminal.
Switching to SIGHUP or SIGTERM seems to do the trick:
cmd = ["bash", 'childProc.sh']
p = subprocess.Popen(cmd, stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
close_fds=True)
time.sleep(3)
print 'killing pid', p.pid
os.kill(p.pid, signal.SIGTERM)
print "timed out and killed child, collecting what output exists so far"
out = p.communicate()[0]
print "got it", out
Outputs:
killing pid 5844
timed out and killed child, collecting what output exists so far
got it output line 0
output line 1
output line 2
Here's a POSIX way of doing it without the temporary file. I realize that subprocess is a little superfluous here, but since the original question used it...
import subprocess
import os
import time
import signal
import sys
pr, pw = os.pipe()
pid = os.fork ()
if pid: #parent
os.close(pw)
cmd = ["bash"]
finish = time.time() + 3
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=pr, close_fds=True)
while p.poll() is None:
time.sleep(0.05)
if finish < time.time():
os.kill(p.pid, signal.SIGTERM)
print "timed out and killed child, collecting what output exists so far"
out, err = p.communicate()
print "got it: ", out
sys.exit(0)
else: #child
os.close(pr)
child_script = """
#!/bin/bash
while [ 1 ]; do
((++i))
echo "output line $i"
sleep 1
done
"""
os.write(pw, child_script)
There are good tips in another stackoverflow question: How do I get 'real-time' information back from a subprocess.Popen in python (2.5)
Most of the hints in there work with pipe.readline() instead of pipe.communicate() because the latter only returns at the end of the process.
I had the exact same problem. I ended up fixing the issue (after scouring Google and finding many related problems) by simply setting the following parameters when calling subprocess.Popen (or .call):
stdout=None
and
stderr=None
There are many problems with these functions but in my specific case I believe stdout was being filled up by the process I was calling and then resulting in a blocking condition. By setting these to None (opposed to something like subprocess.PIPE) I believe this is avoided.
Hope this helps someone.

Categories

Resources