Simple parallelization of subprocess.run() calls?

Simple parallelization of subprocess.run() calls? - python

Consider the following snippet that runs three different subprocesses one after the other with subprocess.run (and notably all with defaulted kwargs):
import subprocess
p1 = subprocess.run(args1)
if p1.returncode != 0:
error()
p2 = subprocess.run(args2)
if p2.returncode != 0:
error()
p3 = subprocess.run(args3)
if p3.returncode != 0:
error()
How can we rewrite this so that the subprocesses are run in parallel to each other?
With Popen right? What does that exactly look like?
For reference, the implementation of subprocess.run is essentially:
with Popen(*popenargs, **kwargs) as process:
try:
stdout, stderr = process.communicate(input, timeout=timeout)
except TimeoutExpired as exc:
process.kill()
if _mswindows:
exc.stdout, exc.stderr = process.communicate()
else:
process.wait()
raise
except:
process.kill()
raise
retcode = process.poll()
return CompletedProcess(process.args, retcode, stdout, stderr)
So something like...
with Popen(args1) as p1:
with Popen(args2) as p2:
with Popen(args3) as p3:
try:
p1.communicate(None, timeout=None)
p2.communicate(None, timeout=None)
p3.communicate(None, timeout=None)
except:
p1.kill()
p2.kill()
p3.kill()
raise
if p1.poll() != 0 or p2.poll() != 0 or p3.poll() != 0:
error()
Is that along the right lines?

I would just use multiprocessing to accomplish your mission but ensuring that your invocation of subprocess.run uses capture_output=True so that the output from the 3 commands running in parallel are not interlaced:
import multiprocessing
import subprocess
def runner(args):
p = subprocess.run(args, capture_output=True, text=True)
if p.returncode != 0:
raise Exception(r'Return code was {p.returncode}.')
return p.stdout, p.stderr
def main():
args1 = ['git', 'status']
args2 = ['git', 'log', '-3']
args3 = ['git', 'branch']
args = [args1, args2, args3]
with multiprocessing.Pool(3) as pool:
results = [pool.apply_async(runner, args=(arg,)) for arg in args]
for result in results:
try:
out, err = result.get()
print(out, end='')
except Exception as e: # runner completed with an Exception
print(e)
if __name__ == '__main__': # required for Windows
main()
Update
With just subprocess we have something like:
import subprocess
args1 = ['git', 'status']
args2 = ['git', 'log', '-3']
args3 = ['git', 'branch']
p1 = subprocess.Popen(args1)
p2 = subprocess.Popen(args2)
p3 = subprocess.Popen(args3)
p1.communicate()
rc1 = p1.returncode
p2.communicate()
rc2 = p2.returncode
p3.communicate()
rc3 = p3.returncode
But, for whatever reason on my Windows platform I never saw the output from the third subprocess command ('git branch'), so there must be some limitation there. Also, if the command you were running required input from stdin before proceeding, that input would have to be provided to the communicate method. But the communicate method would not complete until the entire subprocess has completed and you would get no parallelism, so as a general solution this is not really very good. In the multiprocessing code, there is no problem with having stdin input to communicate.
Update 2
When I recode it as follows, I now get all the expected output. I am not sure why it makes a difference, however. According to the documentation, Popen.communicate:
Interact with process: Send data to stdin. Read data from stdout and stderr, until end-of-file is reached. Wait for process to terminate and set the returncode attribute. The optional input argument should be data to be sent to the child process, or None, if no data should be sent to the child. If streams were opened in text mode, input must be a string. Otherwise, it must be bytes.
So the call should be waiting for the process to terminate. Nevertheless, my preceding comment about the situation where the command you are executing requiring stdin input (via a pipe) would not run in parallel without using multiprocessing.
import subprocess
args1 = ['git', 'status']
args2 = ['git', 'log', '-3']
args3 = ['git', 'branch']
with subprocess.Popen(args1) as p1:
with subprocess.Popen(args2) as p2:
with subprocess.Popen(args3) as p3:
p1.communicate()
rc1 = p1.returncode
p2.communicate()
rc2 = p2.returncode
p3.communicate()
rc3 = p3.returncode

Related

Python3 pipe output of multiple processes to a single process [duplicate]

I know how to run a command using cmd = subprocess.Popen and then subprocess.communicate.
Most of the time I use a string tokenized with shlex.split as 'argv' argument for Popen.
Example with "ls -l":
import subprocess
import shlex
print subprocess.Popen(shlex.split(r'ls -l'), stdin = subprocess.PIPE, stdout = subprocess.PIPE, stderr = subprocess.PIPE).communicate()[0]
However, pipes seem not to work... For instance, the following example returns noting:
import subprocess
import shlex
print subprocess.Popen(shlex.split(r'ls -l | sed "s/a/b/g"'), stdin = subprocess.PIPE, stdout = subprocess.PIPE, stderr = subprocess.PIPE).communicate()[0]
Can you tell me what I am doing wrong please?
Thx

I think you want to instantiate two separate Popen objects here, one for 'ls' and the other for 'sed'. You'll want to pass the first Popen object's stdout attribute as the stdin argument to the 2nd Popen object.
Example:
p1 = subprocess.Popen('ls ...', stdout=subprocess.PIPE)
p2 = subprocess.Popen('sed ...', stdin=p1.stdout, stdout=subprocess.PIPE)
print p2.communicate()
You can keep chaining this way if you have more commands:
p3 = subprocess.Popen('prog', stdin=p2.stdout, ...)
See the subprocess documentation for more info on how to work with subprocesses.

I've made a little function to help with the piping, hope it helps. It will chain Popens as needed.
from subprocess import Popen, PIPE
import shlex
def run(cmd):
"""Runs the given command locally and returns the output, err and exit_code."""
if "|" in cmd:
cmd_parts = cmd.split('|')
else:
cmd_parts = []
cmd_parts.append(cmd)
i = 0
p = {}
for cmd_part in cmd_parts:
cmd_part = cmd_part.strip()
if i == 0:
p[i]=Popen(shlex.split(cmd_part),stdin=None, stdout=PIPE, stderr=PIPE)
else:
p[i]=Popen(shlex.split(cmd_part),stdin=p[i-1].stdout, stdout=PIPE, stderr=PIPE)
i = i +1
(output, err) = p[i-1].communicate()
exit_code = p[0].wait()
return str(output), str(err), exit_code
output, err, exit_code = run("ls -lha /var/log | grep syslog | grep gz")
if exit_code != 0:
print "Output:"
print output
print "Error:"
print err
# Handle error here
else:
# Be happy :D
print output

shlex only splits up spaces according to the shell rules, but does not deal with pipes.
It should, however, work this way:
import subprocess
import shlex
sp_ls = subprocess.Popen(shlex.split(r'ls -l'), stdin = subprocess.PIPE, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
sp_sed = subprocess.Popen(shlex.split(r'sed "s/a/b/g"'), stdin = sp_ls.stdout, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
sp_ls.stdin.close() # makes it similiar to /dev/null
output = sp_ls.communicate()[0] # which makes you ignore any errors.
print output
according to help(subprocess)'s
Replacing shell pipe line
-------------------------
output=`dmesg | grep hda`
==>
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]
HTH

"""
Why don't you use shell
"""
def output_shell(line):
try:
shell_command = Popen(line, stdout=PIPE, stderr=PIPE, shell=True)
except OSError:
return None
except ValueError:
return None
(output, err) = shell_command.communicate()
shell_command.wait()
if shell_command.returncode != 0:
print "Shell command failed to execute"
return None
return str(output)

Thank #hernvnc, #glglgl, and #Jacques Gaudin for the answers. I fixed the code from #hernvnc. His version will cause hanging in some scenarios.
import shlex
from subprocess import PIPE
from subprocess import Popen
def run(cmd, input=None):
"""Runs the given command locally and returns the output, err and exit_code."""
if "|" in cmd:
cmd_parts = cmd.split('|')
else:
cmd_parts = []
cmd_parts.append(cmd)
i = 0
p = {}
for cmd_part in cmd_parts:
cmd_part = cmd_part.strip()
if i == 0:
if input:
p[i]=Popen(shlex.split(cmd_part),stdin=PIPE, stdout=PIPE, stderr=PIPE)
else:
p[i]=Popen(shlex.split(cmd_part),stdin=None, stdout=PIPE, stderr=PIPE)
else:
p[i]=Popen(shlex.split(cmd_part),stdin=p[i-1].stdout, stdout=PIPE, stderr=PIPE)
i = i +1
# close the stdin explicitly, otherwise, the following case will hang.
if input:
p[0].stdin.write(input)
p[0].stdin.close()
(output, err) = p[i-1].communicate()
exit_code = p[0].wait()
return str(output), str(err), exit_code
# test case below
inp = b'[ CMServer State ]\n\nnode node_ip instance state\n--------------------------------------------\n1 linux172 10.90.56.172 1 Primary\n2 linux173 10.90.56.173 2 Standby\n3 linux174 10.90.56.174 3 Standby\n\n[ ETCD State ]\n\nnode node_ip instance state\n--------------------------------------------------\n1 linux172 10.90.56.172 7001 StateFollower\n2 linux173 10.90.56.173 7002 StateLeader\n3 linux174 10.90.56.174 7003 StateFollower\n\n[ Cluster State ]\n\ncluster_state : Normal\nredistributing : No\nbalanced : No\ncurrent_az : AZ_ALL\n\n[ Datanode State ]\n\nnode node_ip instance state | node node_ip instance state | node node_ip instance state\n------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n1 linux172 10.90.56.172 6001 P Standby Normal | 2 linux173 10.90.56.173 6002 S Primary Normal | 3 linux174 10.90.56.174 6003 S Standby Normal'
cmd = "grep -E 'Primary' | tail -1 | awk '{print $3}'"
run(cmd, input=inp)

Display Popen.communicate() in real time [duplicate]

I have a python subprocess that I'm trying to read output and error streams from. Currently I have it working, but I'm only able to read from stderr after I've finished reading from stdout. Here's what it looks like:
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout_iterator = iter(process.stdout.readline, b"")
stderr_iterator = iter(process.stderr.readline, b"")
for line in stdout_iterator:
# Do stuff with line
print line
for line in stderr_iterator:
# Do stuff with line
print line
As you can see, the stderr for loop can't start until the stdout loop completes. How can I modify this to be able to read from both in the correct order the lines come in?
To clarify: I still need to be able to tell whether a line came from stdout or stderr because they will be treated differently in my code.

The code in your question may deadlock if the child process produces enough output on stderr (~100KB on my Linux machine).
There is a communicate() method that allows to read from both stdout and stderr separately:
from subprocess import Popen, PIPE
process = Popen(command, stdout=PIPE, stderr=PIPE)
output, err = process.communicate()
If you need to read the streams while the child process is still running then the portable solution is to use threads (not tested):
from subprocess import Popen, PIPE
from threading import Thread
from Queue import Queue # Python 2
def reader(pipe, queue):
try:
with pipe:
for line in iter(pipe.readline, b''):
queue.put((pipe, line))
finally:
queue.put(None)
process = Popen(command, stdout=PIPE, stderr=PIPE, bufsize=1)
q = Queue()
Thread(target=reader, args=[process.stdout, q]).start()
Thread(target=reader, args=[process.stderr, q]).start()
for _ in range(2):
for source, line in iter(q.get, None):
print "%s: %s" % (source, line),
See:
Python: read streaming input from subprocess.communicate()
Non-blocking read on a subprocess.PIPE in python
Python subprocess get children's output to file and terminal?

Here's a solution based on selectors, but one that preserves order, and streams variable-length characters (even single chars).
The trick is to use read1(), instead of read().
import selectors
import subprocess
import sys
p = subprocess.Popen(
["python", "random_out.py"], stdout=subprocess.PIPE, stderr=subprocess.PIPE
)
sel = selectors.DefaultSelector()
sel.register(p.stdout, selectors.EVENT_READ)
sel.register(p.stderr, selectors.EVENT_READ)
while True:
for key, _ in sel.select():
data = key.fileobj.read1().decode()
if not data:
exit()
if key.fileobj is p.stdout:
print(data, end="")
else:
print(data, end="", file=sys.stderr)
If you want a test program, use this.
import sys
from time import sleep
for i in range(10):
print(f" x{i} ", file=sys.stderr, end="")
sleep(0.1)
print(f" y{i} ", end="")
sleep(0.1)

The order in which a process writes data to different pipes is lost after write.
There is no way you can tell if stdout has been written before stderr.
You can try to read data simultaneously from multiple file descriptors in a non-blocking way
as soon as data is available, but this would only minimize the probability that the order is incorrect.
This program should demonstrate this:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import select
import subprocess
testapps={
'slow': '''
import os
import time
os.write(1, 'aaa')
time.sleep(0.01)
os.write(2, 'bbb')
time.sleep(0.01)
os.write(1, 'ccc')
''',
'fast': '''
import os
os.write(1, 'aaa')
os.write(2, 'bbb')
os.write(1, 'ccc')
''',
'fast2': '''
import os
os.write(1, 'aaa')
os.write(2, 'bbbbbbbbbbbbbbb')
os.write(1, 'ccc')
'''
}
def readfds(fds, maxread):
while True:
fdsin, _, _ = select.select(fds,[],[])
for fd in fdsin:
s = os.read(fd, maxread)
if len(s) == 0:
fds.remove(fd)
continue
yield fd, s
if fds == []:
break
def readfromapp(app, rounds=10, maxread=1024):
f=open('testapp.py', 'w')
f.write(testapps[app])
f.close()
results={}
for i in range(0, rounds):
p = subprocess.Popen(['python', 'testapp.py'], stdout=subprocess.PIPE
, stderr=subprocess.PIPE)
data=''
for (fd, s) in readfds([p.stdout.fileno(), p.stderr.fileno()], maxread):
data = data + s
results[data] = results[data] + 1 if data in results else 1
print 'running %i rounds %s with maxread=%i' % (rounds, app, maxread)
results = sorted(results.items(), key=lambda (k,v): k, reverse=False)
for data, count in results:
print '%03i x %s' % (count, data)
print
print "=> if output is produced slowly this should work as whished"
print " and should return: aaabbbccc"
readfromapp('slow', rounds=100, maxread=1024)
print
print "=> now mostly aaacccbbb is returnd, not as it should be"
readfromapp('fast', rounds=100, maxread=1024)
print
print "=> you could try to read data one by one, and return"
print " e.g. a whole line only when LF is read"
print " (b's should be finished before c's)"
readfromapp('fast', rounds=100, maxread=1)
print
print "=> but even this won't work ..."
readfromapp('fast2', rounds=100, maxread=1)
and outputs something like this:
=> if output is produced slowly this should work as whished
and should return: aaabbbccc
running 100 rounds slow with maxread=1024
100 x aaabbbccc
=> now mostly aaacccbbb is returnd, not as it should be
running 100 rounds fast with maxread=1024
006 x aaabbbccc
094 x aaacccbbb
=> you could try to read data one by one, and return
e.g. a whole line only when LF is read
(b's should be finished before c's)
running 100 rounds fast with maxread=1
003 x aaabbbccc
003 x aababcbcc
094 x abababccc
=> but even this won't work ...
running 100 rounds fast2 with maxread=1
003 x aaabbbbbbbbbbbbbbbccc
001 x aaacbcbcbbbbbbbbbbbbb
008 x aababcbcbcbbbbbbbbbbb
088 x abababcbcbcbbbbbbbbbb

This works for Python3 (3.6):
p = subprocess.Popen(cmd, stdout=subprocess.PIPE,
stderr=subprocess.PIPE, universal_newlines=True)
# Read both stdout and stderr simultaneously
sel = selectors.DefaultSelector()
sel.register(p.stdout, selectors.EVENT_READ)
sel.register(p.stderr, selectors.EVENT_READ)
ok = True
while ok:
for key, val1 in sel.select():
line = key.fileobj.readline()
if not line:
ok = False
break
if key.fileobj is p.stdout:
print(f"STDOUT: {line}", end="")
else:
print(f"STDERR: {line}", end="", file=sys.stderr)

from https://docs.python.org/3/library/subprocess.html#using-the-subprocess-module
If you wish to capture and combine both streams into one, use
stdout=PIPE and stderr=STDOUT instead of capture_output.
so the easiest solution would be:
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
stdout_iterator = iter(process.stdout.readline, b"")
for line in stdout_iterator:
# Do stuff with line
print line

I know this question is very old, but this answer may help others who stumble upon this page in researching a solution for a similar situation, so I'm posting it anyway.
I've built a simple python snippet that will merge any number of pipes into a single one. Of course, as stated above, the order cannot be guaranteed, but this is as close as I think you can get in Python.
It spawns a thread for each of the pipes, reads them line by line and puts them into a Queue (which is FIFO). The main thread loops through the queue, yielding each line.
import threading, queue
def merge_pipes(**named_pipes):
r'''
Merges multiple pipes from subprocess.Popen (maybe other sources as well).
The keyword argument keys will be used in the output to identify the source
of the line.
Example:
p = subprocess.Popen(['some', 'call'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
outputs = {'out': log.info, 'err': log.warn}
for name, line in merge_pipes(out=p.stdout, err=p.stderr):
outputs[name](line)
This will output stdout to the info logger, and stderr to the warning logger
'''
# Constants. Could also be placed outside of the method. I just put them here
# so the method is fully self-contained
PIPE_OPENED=1
PIPE_OUTPUT=2
PIPE_CLOSED=3
# Create a queue where the pipes will be read into
output = queue.Queue()
# This method is the run body for the threads that are instatiated below
# This could be easily rewritten to be outside of the merge_pipes method,
# but to make it fully self-contained I put it here
def pipe_reader(name, pipe):
r"""
reads a single pipe into the queue
"""
output.put( ( PIPE_OPENED, name, ) )
try:
for line in iter(pipe.readline,''):
output.put( ( PIPE_OUTPUT, name, line.rstrip(), ) )
finally:
output.put( ( PIPE_CLOSED, name, ) )
# Start a reader for each pipe
for name, pipe in named_pipes.items():
t=threading.Thread(target=pipe_reader, args=(name, pipe, ))
t.daemon = True
t.start()
# Use a counter to determine how many pipes are left open.
# If all are closed, we can return
pipe_count = 0
# Read the queue in order, blocking if there's no data
for data in iter(output.get,''):
code=data[0]
if code == PIPE_OPENED:
pipe_count += 1
elif code == PIPE_CLOSED:
pipe_count -= 1
elif code == PIPE_OUTPUT:
yield data[1:]
if pipe_count == 0:
return

This works for me (on windows):
https://github.com/waszil/subpiper
from subpiper import subpiper
def my_stdout_callback(line: str):
print(f'STDOUT: {line}')
def my_stderr_callback(line: str):
print(f'STDERR: {line}')
my_additional_path_list = [r'c:\important_location']
retcode = subpiper(cmd='echo magic',
stdout_callback=my_stdout_callback,
stderr_callback=my_stderr_callback,
add_path_list=my_additional_path_list)

How to thread multiple subprocess instances in Python 2.7?

I have three commands that would otherwise be easily chained together on the command-line like so:
$ echo foo | firstCommand - | secondCommand - | thirdCommand - > finalOutput
In other words, the firstCommand processes foo from standard input and pipes the result to secondCommand, which in turn processes that input and pipes its output to thirdCommand, which does processing and redirects its output to the file finalOutput.
I have been trying to recapitulate this in a Python script, using threading. I'd like to use Python in order to manipulate the output from firstCommand before passing it to secondCommand, and again between secondCommand and thirdCommand.
Here's an excerpt of code that does not seem to work:
first_process = subprocess.Popen(['firstCommand', '-'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
second_process = subprocess.Popen(['secondCommand', '-'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
third_process = subprocess.Popen(['thirdCommand', '-'], stdin=subprocess.PIPE, stdout=sys.stdout)
first_thread = threading.Thread(target=consumeOutputFromStdin, args=(sys.stdin, first_process.stdin))
second_thread = threading.Thread(target=consumeOutputFromFirstCommand, args=(first_process.stdout, second_process.stdin))
third_thread = threading.Thread(target=consumeOutputFromSecondCommand, args=(second_process.stdout, third_process.stdin))
first_thread.start()
second_thread.start()
third_thread.start()
first_thread.join()
second_thread.join()
third_thread.join()
first_process.communicate()
second_process.communicate()
third_process.communicate()
# read 1K chunks from standard input
def consumeOutputFromStdin(from_stream, to_stream):
chunk = from_stream.read(1024)
while chunk:
to_stream.write(chunk)
to_stream.flush()
chunk = from_stream.read(1024)
def consumeOutputFromFirstCommand(from_stream, to_stream):
while True:
unprocessed_line = from_stream.readline()
if not unprocessed_line:
break
processed_line = some_python_function_that_processes_line(unprocessed_line)
to_stream.write(processed_line)
to_stream.flush()
def consumeOutputFromSecondCommand(from_stream, to_stream):
while True:
unprocessed_line = from_stream.readline()
if not unprocessed_line:
break
processed_line = a_different_python_function_that_processes_line(unprocessed_line)
to_stream.write(processed_line)
to_stream.flush()
When I run this, the script hangs:
$ echo foo | ./myConversionScript.py
** hangs here... **
If I hit Ctrl-C to terminate the script, the code is stuck on the line third_thread.join():
C-c C-c
Traceback (most recent call last):
File "./myConversionScript.py", line 786, in <module>
sys.exit(main(*sys.argv))
File "./myConversionScript.py", line 556, in main
third_thread.join()
File "/home/foo/proj/tools/lib/python2.7/threading.py", line 949, in join
self.__block.wait()
File "/home/foo/proj/tools/lib/python2.7/threading.py", line 339, in wait
waiter.acquire()
KeyboardInterrupt
If I don't use a third_process and third_thread, instead only passing data from the output of the first thread to the input of the second thread, there is no hang.
Something about the third thread seems to cause things to break, but I don't know why.
I thought the point of communicate() is that it will handle I/O for the three processes, so I'm not sure why there is an I/O hang.
How do I get three or more commands/processes working together, where one thread consumes the output of another thread/process?
UPDATE
Okay, I made some changes that seem to help, based on some comments here and on other sites. The processes are made to wait() for completion, and within the thread methods, I close() the pipes once the thread has processed all the data that it can. My concern is that memory usage will be very high for large datasets, but at least things are working:
first_process = subprocess.Popen(['firstCommand', '-'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
second_process = subprocess.Popen(['secondCommand', '-'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
third_process = subprocess.Popen(['thirdCommand', '-'], stdin=subprocess.PIPE, stdout=sys.stdout)
first_thread = threading.Thread(target=consumeOutputFromStdin, args=(sys.stdin, first_process.stdin))
second_thread = threading.Thread(target=consumeOutputFromFirstCommand, args=(first_process.stdout, second_process.stdin))
third_thread = threading.Thread(target=consumeOutputFromSecondCommand, args=(second_process.stdout, third_process.stdin))
first_thread.start()
second_thread.start()
third_thread.start()
first_thread.join()
second_thread.join()
third_thread.join()
first_process.wait()
second_process.wait()
third_process.wait()
# read 1K chunks from standard input
def consumeOutputFromStdin(from_stream, to_stream):
chunk = from_stream.read(1024)
while chunk:
to_stream.write(chunk)
to_stream.flush()
chunk = from_stream.read(1024)
def consumeOutputFromFirstCommand(from_stream, to_stream):
while True:
unprocessed_line = from_stream.readline()
if not unprocessed_line:
from_stream.close()
to_stream.close()
break
processed_line = some_python_function_that_processes_line(unprocessed_line)
to_stream.write(processed_line)
to_stream.flush()
def consumeOutputFromSecondCommand(from_stream, to_stream):
while True:
unprocessed_line = from_stream.readline()
if not unprocessed_line:
from_stream.close()
to_stream.close()
break
processed_line = a_different_python_function_that_processes_line(unprocessed_line)
to_stream.write(processed_line)
to_stream.flush()

To emulate:
echo foo |
firstCommand - | somePythonRoutine - |
secondCommand - | anotherPythonRoutine - |
thirdCommand - > finalOutput
your current approach with threads works:
from subprocess import Popen, PIPE
first = Popen(["firstCommand", "-"], stdin=PIPE, stdout=PIPE, bufsize=1)
second = Popen(["secondCommand", "-"], stdin=PIPE, stdout=PIPE, bufsize=1)
bind(first.stdout, second.stdin, somePythonRoutine)
with open("finalOutput", "wb") as file:
third = Popen(["thirdCommand", "-"], stdin=PIPE, stdout=file, bufsize=1)
bind(second.stdout, third.stdin, anotherPythonRoutine)
# provide input for the pipeline
first.stdin.write(b"foo")
first.stdin.close()
# wait for it to complete
pipestatus = [p.wait() for p in [first, second, third]]
where each bind() starts a new thread:
from threading import Thread
def bind(input_pipe, output_pipe, line_filter):
def f():
try:
for line in iter(input_pipe.readline, b''):
line = line_filter(line)
if line:
output_pipe.write(line) # no flush unless newline present
finally:
try:
output_pipe.close()
finally:
input_pipe.close()
t = Thread(target=f)
t.daemon = True # die if the program exits
t.start()
and somePythonRoutine, anotherPythonRoutine accept a single line and return it (possibly modified).

The point of communicate() is that it returns the output of the process. This collides with your pipe setup.
You should only call it once on the third process; all the other ones are connected via pipes and know how to communicate with each other - no other / manual intervention is necessary.

AttributeError: 'module' object has no attribute 'kill'

Here is my code:
def cmdoutput(cmd1, flag):
finish = time.time() + 50
p = subprocess.Popen(cmd1, stdin=subprocess.PIPE, stdout = subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)
while p.poll() is None:
time.sleep(1)
if finish < time.time():
os.kill(p.pid, signal.SIGTERM)
print "timed out and killed child, collecting what output exists so far"
if (flag == "1"):#To enable container
out, err = p.communicate(input='container\nzone1')
else:
out, err = p.communicate()
print (out)
return out
When I run this script, I get
Attribute Error: 'module' object has no attribute 'kill'.
What's wrong with my code?

I think you have your own os.py.
Put print os.__file__ before os.kill(...) line, and you will see what's going on.
UPDATE
os.kill is only available in unix in jython
Instead of os.kill(...), use p.kill().
UPDATE
p.kill() not work. (At least in Windows + Jython 2.5.2, 2.5.3).
p.pid is None.
http://bugs.jython.org/issue1898

Change your code as follow. Change CPYTHON_EXECUTABLE_PATH, CMDOUTPUT_SCRIPT_PATH.
CPYTHON_EXECUTABLE_PATH = r'c:\python27\python.exe' # Change path to python.exe
CMDOUTPUT_SCRIPT_PATH = r'c:\users\falsetru\cmdoutput.py' # Change path to the script
def cmdoutput(cmd1, flag):
return subprocess.check_output([CPYTHON_EXECUTABLE_PATH, CMDOUTPUT_SCRIPT_PATH, flag])
Save following code as cmdoutput.py
import subprocess
import sys
def cmdoutput(cmd1, flag):
finish = time.time() + 50
p = subprocess.Popen(cmd1, stdin=subprocess.PIPE, stdout = subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)
while p.poll() is None:
time.sleep(1)
if finish < time.time():
p.kill()
return '<<timeout>>'
if flag == "1":
out, err = p.communicate('container\nzone1')
else:
out, err = p.communicate()
return out
if __name__ == '__main__':
cmd, flag = sys.argv[1:3]
print(cmdoutput(cmd, flag))

python subprocess with timeout and large output (>64K)

I want to execute a process, limit the execution-time by some timeout in seconds and grab the output produced by the process. And I want to do this on windows, linux and freebsd.
I have tried implementing this in three different ways:
cmd - Without timeout and subprocess.PIPE for output capture.
BEHAVIOUR: Operates as expected but does not support timeout, i need timeout...
cmd_to - With timeout and subprocess.PIPE for output capture.
BEHAVIOUR: Blocks subprocess execution when output >= 2^16 bytes.
cmd_totf - With timeout and tempfile.NamedTemporaryfile for output capture.
BEHAVIOUR: Operates as expected but uses temporary files on disk.
These are available below for closer inspection.
As can be seen in the output below, then the timeout-code blocks the execution of the sub-process when using subprocessing.PIPE and output from the subprocess is >= 2^16 bytes.
The subprocess documentation states that this is expected when calling process.wait() and using subprocessing.PIPE, however no warnings are given when using process.poll(), so what is going wrong here?
I have a solution in cmd_totf which use the tempfile module but the tradeoff is that it writes the output to disk, something I would REALLY like to avoid.
So my questions are:
What am I doing wrong in cmd_to?
Is there a way to do what I want and without using tempfiles / keeping the output in memory.
Script to generate a bunch of output ('exp_gen.py'):
#!/usr/bin/env python
import sys
output = "b"*int(sys.argv[1])
print output
Three different implementations (cmd, cmd_to, cmd_totf) of wrappers around subprocessing.Popen:
#!/usr/bin/env python
import subprocess, time, tempfile
bufsize = -1
def cmd(cmdline, timeout=60):
"""
Execute cmdline.
Uses subprocessing and subprocess.PIPE.
"""
p = subprocess.Popen(
cmdline,
bufsize = bufsize,
shell = False,
stdin = subprocess.PIPE,
stdout = subprocess.PIPE,
stderr = subprocess.PIPE
)
out, err = p.communicate()
returncode = p.returncode
return (returncode, err, out)
def cmd_to(cmdline, timeout=60):
"""
Execute cmdline, limit execution time to 'timeout' seconds.
Uses subprocessing and subprocess.PIPE.
"""
p = subprocess.Popen(
cmdline,
bufsize = bufsize,
shell = False,
stdin = subprocess.PIPE,
stdout = subprocess.PIPE,
stderr = subprocess.PIPE
)
t_begin = time.time() # Monitor execution time
seconds_passed = 0
while p.poll() is None and seconds_passed < timeout:
seconds_passed = time.time() - t_begin
time.sleep(0.1)
#if seconds_passed > timeout:
#
# try:
# p.stdout.close() # If they are not closed the fds will hang around until
# p.stderr.close() # os.fdlimit is exceeded and cause a nasty exception
# p.terminate() # Important to close the fds prior to terminating the process!
# # NOTE: Are there any other "non-freed" resources?
# except:
# pass
#
# raise TimeoutInterrupt
out, err = p.communicate()
returncode = p.returncode
return (returncode, err, out)
def cmd_totf(cmdline, timeout=60):
"""
Execute cmdline, limit execution time to 'timeout' seconds.
Uses subprocessing and tempfile instead of subprocessing.PIPE.
"""
output = tempfile.NamedTemporaryFile(delete=False)
error = tempfile.NamedTemporaryFile(delete=False)
p = subprocess.Popen(
cmdline,
bufsize = 0,
shell = False,
stdin = None,
stdout = output,
stderr = error
)
t_begin = time.time() # Monitor execution time
seconds_passed = 0
while p.poll() is None and seconds_passed < timeout:
seconds_passed = time.time() - t_begin
time.sleep(0.1)
#if seconds_passed > timeout:
#
# try:
# p.stdout.close() # If they are not closed the fds will hang around until
# p.stderr.close() # os.fdlimit is exceeded and cause a nasty exception
# p.terminate() # Important to close the fds prior to terminating the process!
# # NOTE: Are there any other "non-freed" resources?
# except:
# pass
#
# raise TimeoutInterrupt
p.wait()
returncode = p.returncode
fd = open(output.name)
out = fd.read()
fd.close()
fd = open(error.name)
err = fd.read()
fd.close()
error.close()
output.close()
return (returncode, err, out)
if __name__ == "__main__":
implementations = [cmd, cmd_to, cmd_totf]
bytes = ['65535', '65536', str(1024*1024)]
timeouts = [5]
for timeout in timeouts:
for size in bytes:
for i in implementations:
t_begin = time.time()
seconds_passed = 0
rc, err, output = i(['exp_gen.py', size], timeout)
seconds_passed = time.time() - t_begin
filler = ' '*(8-len(i.func_name))
print "[%s%s: timeout=%d, iosize=%s, seconds=%f]" % (repr(i.func_name), filler, timeout, size, seconds_passed)
Output from execution:
['cmd' : timeout=5, iosize=65535, seconds=0.016447]
['cmd_to' : timeout=5, iosize=65535, seconds=0.103022]
['cmd_totf': timeout=5, iosize=65535, seconds=0.107176]
['cmd' : timeout=5, iosize=65536, seconds=0.028105]
['cmd_to' : timeout=5, iosize=65536, seconds=5.116658]
['cmd_totf': timeout=5, iosize=65536, seconds=0.104905]
['cmd' : timeout=5, iosize=1048576, seconds=0.025964]
['cmd_to' : timeout=5, iosize=1048576, seconds=5.128062]
['cmd_totf': timeout=5, iosize=1048576, seconds=0.103183]

As opposed to all the warnings in the subprocess documentation then directly reading from process.stdout and process.stderr has provided a better solution.
By better I mean that I can read output from a process that exceeds 2^16 bytes without having to temporarily store the output on disk.
The code follows:
import fcntl
import os
import subprocess
import time
def nonBlockRead(output):
fd = output.fileno()
fl = fcntl.fcntl(fd, fcntl.F_GETFL)
fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)
try:
return output.read()
except:
return ''
def cmd(cmdline, timeout=60):
"""
Execute cmdline, limit execution time to 'timeout' seconds.
Uses the subprocess module and subprocess.PIPE.
Raises TimeoutInterrupt
"""
p = subprocess.Popen(
cmdline,
bufsize = bufsize, # default value of 0 (unbuffered) is best
shell = False, # not really needed; it's disabled by default
stdout = subprocess.PIPE,
stderr = subprocess.PIPE
)
t_begin = time.time() # Monitor execution time
seconds_passed = 0
stdout = ''
stderr = ''
while p.poll() is None and seconds_passed < timeout: # Monitor process
time.sleep(0.1) # Wait a little
seconds_passed = time.time() - t_begin
# p.std* blocks on read(), which messes up the timeout timer.
# To fix this, we use a nonblocking read()
# Note: Not sure if this is Windows compatible
stdout += nonBlockRead(p.stdout)
stderr += nonBlockRead(p.stderr)
if seconds_passed >= timeout:
try:
p.stdout.close() # If they are not closed the fds will hang around until
p.stderr.close() # os.fdlimit is exceeded and cause a nasty exception
p.terminate() # Important to close the fds prior to terminating the process!
# NOTE: Are there any other "non-freed" resources?
except:
pass
raise TimeoutInterrupt
returncode = p.returncode
return (returncode, stdout, stderr)

Disclaimer: This answer is not tested on windows, nor freebsd. But the used modules should work on these systems. I believe this should be a working answer to your question - it works for me.
Here's code I just hacked to solve the problem on linux. It is a combination of several Stackoverflow threads and my own research in the Python 3 documents.
Main characteristics of this code:
Uses processes not threads for blocking I/O because they can more reliably be p.terminated()
Implements a retriggerable timeout watchdog that restarts counting whenever some output happens
Implements a long-term timeout watchdog to limit overall runtime
Can feed in stdin (although I only need to feed in one-time short strings)
Can capture stdout/stderr in the usual Popen means (Only stdout is coded, and stderr redirected to stdout; but can easily be separated)
It's almost realtime because it only checks every 0.2 seconds for output. But you could decrease this or remove the waiting interval easily
Lots of debugging printouts still enabled to see whats happening when.
The only code dependency is enum as implemented here, but the code could easily be changed to work without. It's only used to distinguish the two timeouts - use separate exceptions if you like.
Here's the code - as usual - feedback is highly appreciated:
(Edit 29-Jun-2012 - the code is now actually working)
# Python module runcmd
# Implements a class to launch shell commands which
# are killed after a timeout. Timeouts can be reset
# after each line of output
#
# Use inside other script with:
#
# import runcmd
# (return_code, out) = runcmd.RunCmd(['ls', '-l', '/etc'],
# timeout_runtime,
# timeout_no_output,
# stdin_string).go()
#
import multiprocessing
import queue
import subprocess
import time
import enum
def timestamp():
return time.strftime('%Y%m%d-%H%M%S')
class ErrorRunCmd(Exception): pass
class ErrorRunCmdTimeOut(ErrorRunCmd): pass
class Enqueue_output(multiprocessing.Process):
def __init__(self, out, queue):
multiprocessing.Process.__init__(self)
self.out = out
self.queue = queue
self.daemon = True
def run(self):
try:
for line in iter(self.out.readline, b''):
#print('worker read:', line)
self.queue.put(line)
except ValueError: pass # Readline of closed file
self.out.close()
class Enqueue_input(multiprocessing.Process):
def __init__(self, inp, iterable):
multiprocessing.Process.__init__(self)
self.inp = inp
self.iterable = iterable
self.daemon = True
def run(self):
#print("writing stdin")
for line in self.iterable:
self.inp.write(bytes(line,'utf-8'))
self.inp.close()
#print("writing stdin DONE")
class RunCmd():
"""RunCmd - class to launch shell commands
Captures and returns stdout. Kills child after a given
amount (timeout_runtime) wallclock seconds. Can also
kill after timeout_retriggerable wallclock seconds.
This second timer is reset whenever the child does some
output
(return_code, out) = RunCmd(['ls', '-l', '/etc'],
timeout_runtime,
timeout_no_output,
stdin_string).go()
"""
Timeout = enum.Enum('No','Retriggerable','Runtime')
def __init__(self, cmd, timeout_runtime, timeout_retriggerable, stdin=None):
self.dbg = False
self.cmd = cmd
self.timeout_retriggerable = timeout_retriggerable
self.timeout_runtime = timeout_runtime
self.timeout_hit = self.Timeout.No
self.stdout = '--Cmd did not yield any output--'
self.stdin = stdin
def read_queue(self, q):
time_last_output = None
try:
bstr = q.get(False) # non-blocking
if self.dbg: print('{} chars read'.format(len(bstr)))
time_last_output = time.time()
self.stdout += bstr
except queue.Empty:
#print('queue empty')
pass
return time_last_output
def go(self):
if self.stdin:
pstdin = subprocess.PIPE
else:
pstdin = None
p = subprocess.Popen(self.cmd, shell=False, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, stdin=pstdin)
pin = None
if (pstdin):
pin = Enqueue_input(p.stdin, [self.stdin + '\n'])
pin.start()
q = multiprocessing.Queue()
pout = Enqueue_output(p.stdout, q)
pout.start()
try:
if self.dbg: print('Beginning subprocess with timeout {}/{} s on {}'.format(self.timeout_retriggerable, self.timeout_runtime, time.asctime()))
time_begin = time.time()
time_last_output = time_begin
seconds_passed = 0
self.stdout = b''
once = True # ensure loop's executed at least once
# some child cmds may exit very fast, but still produce output
while once or p.poll() is None or not q.empty():
once = False
if self.dbg: print('a) {} of {}/{} secs passed and overall {} chars read'.format(seconds_passed, self.timeout_retriggerable, self.timeout_runtime, len(self.stdout)))
tlo = self.read_queue(q)
if tlo:
time_last_output = tlo
now = time.time()
if now - time_last_output >= self.timeout_retriggerable:
self.timeout_hit = self.Timeout.Retriggerable
raise ErrorRunCmdTimeOut(self)
if now - time_begin >= self.timeout_runtime:
self.timeout_hit = self.Timeout.Runtime
raise ErrorRunCmdTimeOut(self)
if q.empty():
time.sleep(0.1)
# Final try to get "last-millisecond" output
self.read_queue(q)
finally:
self._close(p, [pout, pin])
return (self.returncode, self.stdout)
def _close(self, p, procs):
if self.dbg:
if self.timeout_hit != self.Timeout.No:
print('{} A TIMEOUT occured: {}'.format(timestamp(), self.timeout_hit))
else:
print('{} No timeout occured'.format(timestamp()))
for process in [proc for proc in procs if proc]:
try:
process.terminate()
except:
print('{} Process termination raised trouble'.format(timestamp()))
raise
try:
p.stdin.close()
except: pass
if self.dbg: print('{} _closed stdin'.format(timestamp()))
try:
p.stdout.close() # If they are not closed the fds will hang around until
except: pass
if self.dbg: print('{} _closed stdout'.format(timestamp()))
#p.stderr.close() # os.fdlimit is exceeded and cause a nasty exception
try:
p.terminate() # Important to close the fds prior to terminating the process!
# NOTE: Are there any other "non-freed" resources?
except: pass
if self.dbg: print('{} _closed Popen'.format(timestamp()))
try:
self.stdout = self.stdout.decode('utf-8')
except: pass
self.returncode = p.returncode
if self.dbg: print('{} _closed all'.format(timestamp()))
Use with:
import runcmd
cmd = ['ls', '-l', '/etc']
worker = runcmd.RunCmd(cmd,
40, # limit runtime [wallclock seconds]
2, # limit runtime after last output [wallclk secs]
'' # stdin input string
)
(return_code, out) = worker.go()
if worker.timeout_hit != worker.Timeout.No:
print('A TIMEOUT occured: {}'.format(worker.timeout_hit))
else:
print('No timeout occured')
print("Running '{:s}' returned {:d} and {:d} chars of output".format(cmd, return_code, len(out)))
print('Output:')
print(out)
command - the first argument - should be a list of a command and its arguments. It is used for the Popen(shell=False) call and its timeouts are in seconds. There's currently no code to disable the timeouts. Set timeout_no_output to time_runtime to effectively disable the retriggerable timeout_no_output.
stdin_string can be any string which is to be sent to the command's standard input. Set to None if your command does not need any input. If a string is provided, a final '\n' is appended.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Simple parallelization of subprocess.run() calls? - python

Related

Python3 pipe output of multiple processes to a single process [duplicate]

Display Popen.communicate() in real time [duplicate]

How to thread multiple subprocess instances in Python 2.7?

AttributeError: 'module' object has no attribute 'kill'

python subprocess with timeout and large output (>64K)

Categories

Resources