Run Python script within Python by using `subprocess.Popen` in real time - python

I want to run a Python script (or any executable, for that manner) from a python script and get the output in real time. I have followed many tutorials, and my current code looks like this:
import subprocess
with open("test2", "w") as f:
f.write("""import time
print('start')
time.sleep(5)
print('done')""")
process = subprocess.Popen(['python3', "test2"], stdout=subprocess.PIPE)
while True:
output = process.stdout.readline()
if output == '' and process.poll() is not None:
break
if output:
print(output.strip())
rc = process.poll()
The first bit just creates the file that will be run, for clarity's sake.
I have two problems with this code:
It does not give the output in real time. It waits untill the process has finished.
It does not terminate the loop once the process has finished.
Any help would be very welcome.
EDIT: Thanks to #JohnAnderson for the fix to the first problem: replacing if output == '' and process.poll() is not None: with if output == b'' and process.poll() is not None:

Last night I've set out to do this using a pipe:
import os
import subprocess
with open("test2", "w") as f:
f.write("""import time
print('start')
time.sleep(2)
print('done')""")
(readend, writeend) = os.pipe()
p = subprocess.Popen(['python3', '-u', 'test2'], stdout=writeend, bufsize=0)
still_open = True
output = ""
output_buf = os.read(readend, 1).decode()
while output_buf:
print(output_buf, end="")
output += output_buf
if still_open and p.poll() is not None:
os.close(writeend)
still_open = False
output_buf = os.read(readend, 1).decode()
Forcing buffering out of the picture and reading one character at the time (to make sure we do not block writes from the process having filled a buffer), closing the writing end when process finishes to make sure read catches the EOF correctly. Having looked at the subprocess though that turned out to be a bit of an overkill. With PIPE you get most of that for free and I ended with this which seems to work fine (call read as many times as necessary to keep emptying the pipe) with just this and assuming the process finished, you do not have to worry about polling it and/or making sure the write end of the pipe is closed to correctly detect EOF and get out of the loop:
p = subprocess.Popen(['python3', '-u', 'test2'],
stdout=subprocess.PIPE, bufsize=1,
universal_newlines=True)
output = ""
output_buf = p.stdout.readline()
while output_buf:
print(output_buf, end="")
output += output_buf
output_buf = p.stdout.readline()
This is a bit less "real-time" as it is basically line buffered.
Note: I've added -u to you Python call, as you need to also make sure your called process' buffering does not get in the way.

Related

Display process output incrementally using Python subprocess

I'm trying to run "docker-compose pull" from inside a Python automation script and to incrementally display the same output that Docker command would print if it was run directly from the shell. This command prints a line for each Docker image found in the system, incrementally updates each line with the Docker image's download progress (a percentage) and replaces this percentage with a "done" when the download has completed. I first tried getting the command output with subprocess.poll() and (blocking) readline() calls:
import shlex
import subprocess
def run(command, shell=False):
p = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=shell)
while True:
# print one output line
output_line = p.stdout.readline().decode('utf8')
error_output_line = p.stderr.readline().decode('utf8')
if output_line:
print(output_line.strip())
if error_output_line:
print(error_output_line.strip())
# check if process finished
return_code = p.poll()
if return_code is not None and output_line == '' and error_output_line == '':
break
if return_code > 0:
print("%s failed, error code %d" % (command, return_code))
run("docker-compose pull")
The code gets stuck in the first (blocking) readline() call. Then I tried to do the same without blocking:
import select
import shlex
import subprocess
import sys
import time
def run(command, shell=False):
p = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=shell)
io_poller = select.poll()
io_poller.register(p.stdout.fileno(), select.POLLIN)
io_poller.register(p.stderr.fileno(), select.POLLIN)
while True:
# poll IO for output
io_events_list = []
while not io_events_list:
time.sleep(1)
io_events_list = io_poller.poll(0)
# print new output
for event in io_events_list:
# must be tested because non-registered events (eg POLLHUP) can also be returned
if event[1] & select.POLLIN:
if event[0] == p.stdout.fileno():
output_str = p.stdout.read(1).decode('utf8')
print(output_str, end="")
if event[0] == p.stderr.fileno():
error_output_str = p.stderr.read(1).decode('utf8')
print(error_output_str, end="")
# check if process finished
# when subprocess finishes, iopoller.poll(0) returns a list with 2 select.POLLHUP events
# (one for stdout, one for stderr) and does not enter in the inner loop
return_code = p.poll()
if return_code is not None:
break
if return_code > 0:
print("%s failed, error code %d" % (command, return_code))
run("docker-compose pull")
This works, but only the final lines (with "done" at the end) are printed to the screen, when all Docker images downloads have been completed.
Both methods work fine with a command with simpler output such as "ls". Maybe the problem is related with how this Docker command prints incrementally to screen, overwriting already written lines ? Is there a safe way to incrementally show the exact output of a command in the command line when running it via a Python script?
EDIT: 2nd code block was corrected
Always openSTDIN as a pipe, and if you are not using it, close it immediately.
p.stdout.read() will block until the pipe is closed, so your polling code does nothing useful here. It needs modifications.
I suggest not to use shell=True
Instead of *.readline(), try with *.read(1) and wait for "\n"
Of course you can do what you want in Python, the question is how. Because, a child process might have different ideas about how its output should look like, that's when trouble starts. E.g. the process might want explicitly a terminal at the other end, not your process. Or a lot of such simple nonsense. Also, a buffering may also cause problems. You can try starting Python in unbuffered mode to check. (/usr/bin/python -U)
If nothing works, then use pexpect automation library instead of subprocess.
I have found a solution, based on the first code block of my question:
def run(command,shell=False):
p = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=shell)
while True:
# read one char at a time
output_line = p.stderr.read(1).decode("utf8")
if output_line != "":
print(output_line,end="")
else:
# check if process finished
return_code = p.poll()
if return_code is not None:
if return_code > 0:
raise Exception("Command %s failed" % command)
break
return return_code
Notice that docker-compose uses stderr to print its progress instead of stdout. #Dalen has explained that some applications do it when they want that their results are pipeable somewhere, for instance a file, but also want to be able to show their progress.

Python: asynhronously print stdout from multiple subprocesses

I'm testing out a way to print out stdout from several subprocesses in Python 2.7. What I have setup is a main process that spawns, at the moment, three subprocesses and spits out their output. Each subprocess is a for-loop that goes to sleep for some random amount of time, and when it wakes up, says "Slept for X seconds".
The problem I'm seeing is that the printing out seems synchronous. Say subprocess A sleeps for 1 second, subprocess B sleeps for 3 seconds, and subprocess C sleeps for 10 seconds. The main process stops for the full 10 seconds when it's trying to see if subprocess C has something, even though the other two have probably slept and printed something out. This is to simulate if a subprocess truly has nothing to output for a longer period of time than the other two.
I need a solution which works on Windows.
My code is as follows:
main_process.py
import sys
import subprocess
logfile = open('logfile.txt', 'w')
processes = [
subprocess.Popen('python subproc_1.py', stdout=subprocess.PIPE, bufsize=1),
subprocess.Popen('python subproc_2.py', stdout=subprocess.PIPE, bufsize=1),
subprocess.Popen('python subproc_3.py', stdout=subprocess.PIPE, bufsize=1),
]
while True:
line = processes[0].stdout.readline()
if line != '':
sys.stdout.write(line)
logfile.write(line)
line = processes[1].stdout.readline()
if line != '':
sys.stdout.write(line)
logfile.write(line)
line = processes[2].stdout.readline()
if line != '':
sys.stdout.write(line)
logfile.write(line)
#If everyone is dead, break
if processes[0].poll() is not None and \
processes[1].poll() is not None and \
processes[2].poll() is not None:
break
processes[0].wait()
processes[1].wait()
print 'Done'
subproc_1.py/subproc_2.py/subproc_3.py
import time, sys, random
sleep_time = random.random() * 3
for x in range(0, 20):
print "[PROC1] Slept for {0} seconds".format(sleep_time)
sys.stdout.flush()
time.sleep(sleep_time)
sleep_time = random.random() * 3 #this is different for each subprocess.
Update: Solution
Taking the answer below along with this question, this is this should work.
import sys
import subprocess
from threading import Thread
try:
from Queue import Queue, Empty
except ImportError:
from queue import Queue, Empty # for Python 3.x
ON_POSIX = 'posix' in sys.builtin_module_names
def enqueue_output(out, queue):
for line in iter(out.readline, b''):
queue.put(line)
out.close()
if __name__ == '__main__':
logfile = open('logfile.txt', 'w')
processes = [
subprocess.Popen('python subproc_1.py', stdout=subprocess.PIPE, bufsize=1),
subprocess.Popen('python subproc_2.py', stdout=subprocess.PIPE, bufsize=1),
subprocess.Popen('python subproc_3.py', stdout=subprocess.PIPE, bufsize=1),
]
q = Queue()
threads = []
for p in processes:
threads.append(Thread(target=enqueue_output, args=(p.stdout, q)))
for t in threads:
t.daemon = True
t.start()
while True:
try:
line = q.get_nowait()
except Empty:
pass
else:
sys.stdout.write(line)
logfile.write(line)
logfile.flush()
#break when all processes are done.
if all(p.poll() is not None for p in processes):
break
print 'All processes done'
I'm not sure if I need any cleanup code at the end of the while loop. If anyone has comments about it, please add them.
And each subproc script looks similar to this (I edited for the sake of making a better example):
import datetime, time, sys, random
for x in range(0, 20):
sleep_time = random.random() * 3
time.sleep(sleep_time)
timestamp = datetime.datetime.fromtimestamp(time.time()).strftime('%H%M%S.%f')
print "[{0}][PROC1] Slept for {1} seconds".format(timestamp, sleep_time)
sys.stdout.flush()
print "[{0}][PROC1] Done".format(timestamp)
sys.stdout.flush()
Your problem comes from the fact that readline() is a blocking function; if you call it on a file object and there isn't a line waiting to be read, the call won't return until there is a line of output. So what you have now will read repeatedly from subprocesses 1, 2, and 3 in that order, pausing at each until output is ready.
(Edit: The OP clarified that they're on Windows, which makes the below inapplicable. )
If you want to read from whichever output stream is ready, you need to check on the status of the streams in non-blocking fashion, using the select module, and then attempt reads only on those that are ready. select provides various ways of doing this, but for the sake of example we'll use select.select(). After starting your subprocesses, you'll have something like:
streams = [p.stdout for p in processes]
def output(s):
for f in [sys.stdout, logfile]:
f.write(s)
f.flush()
while True:
rstreams, _, _ = select.select(streams, [], [])
for stream in rstreams:
line = stream.readline()
output(line)
if all(p.poll() is not None for p in processes):
break
for stream in streams:
output(stream.read())
What select() does, when called with three lists of file objects (or file descriptors), is return three subsets of its arguments, which are the streams that are ready for reading, are ready for writing, or have an error condition. Thus on each iteration of the loop we check to see which output streams are ready to read, and iterate over just those. Then we repeat. (Note that it's important here that you're line-buffering the output; the above code assumes that if a stream is ready for reading there's at least one full line ready to be read. If you specify different buffering the above can block.)
A further problem with your original code: When you exit the loop after poll() reports all subprocesses to have exited, you might not have read all their output. So you need to do a last sweep over the streams to read any remaining output.
Note: The example code I gave doesn't try all that hard to capture the subprocesses' output in exactly the order in which it becomes available (which is impossible to do perfectly, but can be approximated more closely than the above manages to do). It also lacks other refinements (for example, in the main loop it'll continue to select on the stdout of every subprocess, even after some have already terminated, which is harmless, but inefficient). It's just meant to illustrate a basic technique of non-blocking IO.

How can I use Python to pipe stdin/stdout to Perl script

This Python code pipes data through Perl script fine.
import subprocess
kw = {}
kw['executable'] = None
kw['shell'] = True
kw['stdin'] = None
kw['stdout'] = subprocess.PIPE
kw['stderr'] = subprocess.PIPE
args = ' '.join(['/usr/bin/perl','-w','/path/script.perl','<','/path/mydata'])
subproc = subprocess.Popen(args,**kw)
for line in iter(subproc.stdout.readline, ''):
print line.rstrip().decode('UTF-8')
However, it requires that I first to save my buffers to a disk file (/path/mydata). It's cleaner to loop through the data in Python code and pass line-by-line to the subprocess like this:
import subprocess
kw = {}
kw['executable'] = '/usr/bin/perl'
kw['shell'] = False
kw['stderr'] = subprocess.PIPE
kw['stdin'] = subprocess.PIPE
kw['stdout'] = subprocess.PIPE
args = ['-w','/path/script.perl',]
subproc = subprocess.Popen(args,**kw)
f = codecs.open('/path/mydata','r','UTF-8')
for line in f:
subproc.stdin.write('%s\n'%(line.strip().encode('UTF-8')))
print line.strip() ### code hangs after printing this ###
for line in iter(subproc.stdout.readline, ''):
print line.rstrip().decode('UTF-8')
subproc.terminate()
f.close()
The code hangs with the readline after sending the first line to the subprocess. I have other executables that use this exact same code perfectly.
My data files can be quite large (1.5 GB) Is there way to accomplish piping the data without saving to file? I don't want to re-write the perl script for compatibility with other systems.
Your code is blocking at the line:
for line in iter(subproc.stdout.readline, ''):
because the only way this iteration can terminate is when EOF (end-of-file) is reached, which will happen when the subprocess terminates. You don't want to wait till the process terminates, however, you only want to wait till its finished processing the line that was sent to it.
Futhermore, you're encountering issues with buffering as Chris Morgan has already pointed out. Another question on stackoverflow discusses how you can do non-blocking reads with subprocess. I've hacked up a quick and dirty adaptation of the code from that question to your problem:
def enqueue_output(out, queue):
for line in iter(out.readline, ''):
queue.put(line)
out.close()
kw = {}
kw['executable'] = '/usr/bin/perl'
kw['shell'] = False
kw['stderr'] = subprocess.PIPE
kw['stdin'] = subprocess.PIPE
kw['stdout'] = subprocess.PIPE
args = ['-w','/path/script.perl',]
subproc = subprocess.Popen(args, **kw)
f = codecs.open('/path/mydata','r','UTF-8')
q = Queue.Queue()
t = threading.Thread(target = enqueue_output, args = (subproc.stdout, q))
t.daemon = True
t.start()
for line in f:
subproc.stdin.write('%s\n'%(line.strip().encode('UTF-8')))
print "Sent:", line.strip() ### code hangs after printing this ###
try:
line = q.get_nowait()
except Queue.Empty:
pass
else:
print "Received:", line.rstrip().decode('UTF-8')
subproc.terminate()
f.close()
It's quite likely that you'll need to make modifications to this code, but at least it doesn't block.
Thanks srgerg. I had also tried the threading solution. This solution alone, however, always hung. Both my previous code and srgerg's code were missing the final solution, Your tip gave me one last idea.
The final solution writes enough dummy data force the final valid lines from the buffer. To support this, I added code that tracks how many valid lines were written to stdin. The threaded loop opens the output file, saves the data, and breaks when the read lines equal the valid input lines. This solution ensures it reads and writes line-by-line for any size file.
def std_output(stdout,outfile=''):
out = 0
f = codecs.open(outfile,'w','UTF-8')
for line in iter(stdout.readline, ''):
f.write('%s\n'%(line.rstrip().decode('UTF-8')))
out += 1
if i == out: break
stdout.close()
f.close()
outfile = '/path/myout'
infile = '/path/mydata'
subproc = subprocess.Popen(args,**kw)
t = threading.Thread(target=std_output,args=[subproc.stdout,outfile])
t.daemon = True
t.start()
i = 0
f = codecs.open(infile,'r','UTF-8')
for line in f:
subproc.stdin.write('%s\n'%(line.strip().encode('UTF-8')))
i += 1
subproc.stdin.write('%s\n'%(' '*4096)) ### push dummy data ###
f.close()
t.join()
subproc.terminate()
See the warnings mentioned in the manual about using Popen.stdin and Popen.stdout (just above Popen.stdin):
Warning: Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.
I realise that having a gigabyte-and-a-half string in memory all at once isn't very desirable, but using communicate() is a way that will work, while as you've observed, once the OS pipe buffer fills up, the stdin.write() + stdout.read() way can become deadlocked.
Is using communicate() feasible for you?

How to get output from subprocess.Popen(). proc.stdout.readline() blocks, no data prints out

I want output from execute Test_Pipe.py, I tried following code on Linux but it did not work.
Test_Pipe.py
import time
while True :
print "Someting ..."
time.sleep(.1)
Caller.py
import subprocess as subp
import time
proc = subp.Popen(["python", "Test_Pipe.py"], stdout=subp.PIPE, stdin=subp.PIPE)
while True :
data = proc.stdout.readline() #block / wait
print data
time.sleep(.1)
The line proc.stdout.readline() was blocked, so no data prints out.
You obviously can use subprocess.communicate but I think you are looking for real time input and output.
readline was blocked because the process is probably waiting on your input. You can read character by character to overcome this like the following:
import subprocess
import sys
process = subprocess.Popen(
cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE
)
while True:
out = process.stdout.read(1)
if out == '' and process.poll() != None:
break
if out != '':
sys.stdout.write(out)
sys.stdout.flush()
Nadia's snippet does work but calling read with a 1 byte buffer is highly unrecommended. The better way to do this would be to set the stdout file descriptor to nonblocking using fcntl
fcntl.fcntl(
proc.stdout.fileno(),
fcntl.F_SETFL,
fcntl.fcntl(proc.stdout.fileno(), fcntl.F_GETFL) | os.O_NONBLOCK,
)
and then using select to test if the data is ready
while proc.poll() == None:
readx = select.select([proc.stdout.fileno()], [], [])[0]
if readx:
chunk = proc.stdout.read()
print chunk
She was correct in that your problem must be different from what you posted as Caller.py and Test_Pipe.py do work as provided.
https://derrickpetzold.com/p/capturing-output-from-ffmpeg-python/
Test_Pipe.py buffers its stdout by default so proc in Caller.py doesn't see any output until the child's buffer is full (if the buffer size is 8KB then it takes around a minute to fill Test_Pipe.py's stdout buffer).
To make the output unbuffered (line-buffered for text streams) you could pass -u flag to the child Python script. It allows to read subprocess' output line by line in "real-time":
import sys
from subprocess import Popen, PIPE
proc = Popen([sys.executable, "-u", "Test_Pipe.py"], stdout=PIPE, bufsize=1)
for line in iter(proc.stdout.readline, b''):
print line,
proc.communicate()
See links in Python: read streaming input from subprocess.communicate() on how to solve the block-buffering issue for non-Python child processes.
To avoid the many problems that can always arise with buffering for tasks such as "getting the subprocess's output to the main process in real time", I always recommend using pexpect for all non-Windows platform, wexpect on Windows, instead of subprocess, when such tasks are desired.

How do I get 'real-time' information back from a subprocess.Popen in python (2.5)

I'd like to use the subprocess module in the following way:
create a new process that potentially takes a long time to execute.
capture stdout (or stderr, or potentially both, either together or separately)
Process data from the subprocess as it comes in, perhaps firing events on every line received (in wxPython say) or simply printing them out for now.
I've created processes with Popen, but if I use communicate() the data comes at me all at once, once the process has terminated.
If I create a separate thread that does a blocking readline() of myprocess.stdout (using stdout = subprocess.PIPE) I don't get any lines with this method either, until the process terminates. (no matter what I set as bufsize)
Is there a way to deal with this that isn't horrendous, and works well on multiple platforms?
Update with code that appears not to work (on windows anyway)
class ThreadWorker(threading.Thread):
def __init__(self, callable, *args, **kwargs):
super(ThreadWorker, self).__init__()
self.callable = callable
self.args = args
self.kwargs = kwargs
self.setDaemon(True)
def run(self):
try:
self.callable(*self.args, **self.kwargs)
except wx.PyDeadObjectError:
pass
except Exception, e:
print e
if __name__ == "__main__":
import os
from subprocess import Popen, PIPE
def worker(pipe):
while True:
line = pipe.readline()
if line == '': break
else: print line
proc = Popen("python subprocess_test.py", shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE)
stdout_worker = ThreadWorker(worker, proc.stdout)
stderr_worker = ThreadWorker(worker, proc.stderr)
stdout_worker.start()
stderr_worker.start()
while True: pass
stdout will be buffered - so you won't get anything till that buffer is filled, or the subprocess exits.
You can try flushing stdout from the sub-process, or using stderr, or changing stdout on non-buffered mode.
It sounds like the issue might be the use of buffered output by the subprocess - if a relatively small amount of output is created, it could be buffered until the subprocess exits. Some background can be found here:
Here's what worked for me:
cmd = ["./tester_script.bash"]
p = subprocess.Popen( cmd, shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE )
while p.poll() is None:
out = p.stdout.readline()
do_something_with( out, err )
In your case you could try to pass a reference to the sub-process to your Worker Thread, and do the polling inside the thread. I don't know how it will behave when two threads poll (and interact with) the same subprocess, but it may work.
Also note thate the while p.poll() is None: is intended as is. Do not replace it with while not p.poll() as in python 0 (the returncode for successful termination) is also considered False.
I've been running into this problem as well. The problem occurs because you are trying to read stderr as well. If there are no errors, then trying to read from stderr would block.
On Windows, there is no easy way to poll() file descriptors (only Winsock sockets).
So a solution is not to try and read from stderr.
Using pexpect [http://www.noah.org/wiki/Pexpect] with non-blocking readlines will resolve this problem. It stems from the fact that pipes are buffered, and so your app's output is getting buffered by the pipe, therefore you can't get to that output until the buffer fills or the process dies.
This seems to be a well-known Python limitation, see
PEP 3145 and maybe others.
Read one character at a time: http://blog.thelinuxkid.com/2013/06/get-python-subprocess-output-without.html
import contextlib
import subprocess
# Unix, Windows and old Macintosh end-of-line
newlines = ['\n', '\r\n', '\r']
def unbuffered(proc, stream='stdout'):
stream = getattr(proc, stream)
with contextlib.closing(stream):
while True:
out = []
last = stream.read(1)
# Don't loop forever
if last == '' and proc.poll() is not None:
break
while last not in newlines:
# Don't loop forever
if last == '' and proc.poll() is not None:
break
out.append(last)
last = stream.read(1)
out = ''.join(out)
yield out
def example():
cmd = ['ls', '-l', '/']
proc = subprocess.Popen(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
# Make all end-of-lines '\n'
universal_newlines=True,
)
for line in unbuffered(proc):
print line
example()
Using subprocess.Popen, I can run the .exe of one of my C# projects and redirect the output to my Python file. I am able now to print() all the information being output to the C# console (using Console.WriteLine()) to the Python console.
Python code:
from subprocess import Popen, PIPE, STDOUT
p = Popen('ConsoleDataImporter.exe', stdout = PIPE, stderr = STDOUT, shell = True)
while True:
line = p.stdout.readline()
print(line)
if not line:
break
This gets the console output of my .NET project line by line as it is created and breaks out of the enclosing while loop upon the project's termination. I'd imagine this would work for two python files as well.
I've used the pexpect module for this, it seems to work ok. http://sourceforge.net/projects/pexpect/

Categories

Resources