python subprocess: "write error: Broken pipe"

python subprocess: "write error: Broken pipe" - python

I have a problem piping a simple subprocess.Popen.
Code:
import subprocess
cmd = 'cat file | sort -g -k3 | head -20 | cut -f2,3' % (pattern,file)
p = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE)
for line in p.stdout:
print(line.decode().strip())
Output for file ~1000 lines in length:
...
sort: write failed: standard output: Broken pipe
sort: write error
Output for file >241 lines in length:
...
sort: fflush failed: standard output: Broken pipe
sort: write error
Output for file <241 lines in length is fine.
I have been reading the docs and googling like mad but there is something fundamental about the subprocess module that I'm missing ... maybe to do with buffers. I've tried p.stdout.flush() and playing with the buffer size and p.wait(). I've tried to reproduce this with commands like 'sleep 20; cat moderatefile' but this seems to run without error.

From the recipes on subprocess docs:
# To replace shell pipeline like output=`dmesg | grep hda`
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]

This is because you shouldn't use "shell pipes" in the command passed to subprocess.Popen, you should use the subprocess.PIPE like this:
from subprocess import Popen, PIPE
p1 = Popen('cat file', stdout=PIPE)
p2 = Popen('sort -g -k 3', stdin=p1.stdout, stdout=PIPE)
p3 = Popen('head -20', stdin=p2.stdout, stdout=PIPE)
p4 = Popen('cut -f2,3', stdin=p3.stdout)
final_output = p4.stdout.read()
But i have to say that what you're trying to do could be done in pure python instead of calling a bunch of shell commands.

I have been having the same error. Even put the pipe in a bash script and executed that instead of the pipe in Python. From Python it would get the broken pipe error, from bash it wouldn't.
It seems to me that perhaps the last command prior to the head is throwing an error as it's (the sort) STDOUT is closed. Python must be picking up on this whereas with the shell the error is silent. I've changed my code to consume the entire input and the error went away.
Would make sense also with smaller files working as the pipe probably buffers the entire output before head exits. This would explain the breaks on larger files.
e.g., instead of a 'head -1' (in my case, I was only wanting the first line), I did an awk 'NR == 1'
There are probably better ways of doing this depending on where the 'head -X' occurs in the pipe.

You don't need shell=True. Don't invoke the shell. This is how I would do it:
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
stdout_value = p.communicate()[0]
stdout_value # the output
See if you face the problem about the buffer after using this?

try using communicate(), rather than reading directly from stdout.
the python docs say this:
"Warning Use communicate() rather than
.stdin.write, .stdout.read or
.stderr.read to avoid deadlocks due to
any of the other OS pipe buffers
filling up and blocking the child
process."
http://docs.python.org/library/subprocess.html#subprocess.Popen.stdout
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
output = p.communicate[0]
for line in output:
# do stuff

Related

How can I execute a shell script from stdin and get the output in realtime using python?

I want to mimic the below using python subprocess:
cat /tmp/myscript.sh | sh
The /tmp/myscript.sh contains:
ls -l
sleep 5
pwd
Behaviour: stdout shows the result of "ls" and the results of "pwd" are shown after 5 seconds.
What I have done is:
import subprocess
f = open("/tmp/myscript.sh", "rb")
p = subprocess.Popen("sh", shell=True, stdin=f,
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
f.close()
p.stdout.read()
This waits until ALL the processing is done and shows the results all at once. The desired effect is to fill in the stdout pipe in realtime.
Note: This expectation seems non sense but this is sample from a bigger and complex situation which I cannot describe here.
Another Note: I can't use p.communicate. This whole thing is inside a select.select statement so I need stdout to be in a pipe.

The problem is that when you don't give an argument to read(), it reads until EOF, which means it has to wait until the subprocess exits and the pipe is closed.
If you call it with a small argument it will return immediately after it has read that many characters
import subprocess
f = open("/tmp/myscript.sh", "rb")
p = subprocess.Popen("sh", shell=True, stdin=f,
stdout=subprocess.PIPE, stderr=subprocess.PIPE, encoding='utf-8')
f.close()
while True:
c = p.stdout.read(1)
if not c:
break
print(c, end='')
print()
Note that some many buffer their output when stdout is connected to a pipe, so this might not solve the problem for everything. The shell doesn't buffer its own output, but ls probably does. But since ls is producing all its output at once, it won't be a problem in this case.
To solve the more general problem you may need to use a pty instead of a pipe. The pexpect library is useful for this.

Subprocess Command Output? [duplicate]

This question already has answers here:
Running shell command and capturing the output
(21 answers)
Closed 1 year ago.
I have been trying for hours to get the output of a shell command as a string. I have tried both subprocess and os, neither of which have worked, and within subprocess I have tried check_output(), getoutput(), Popen(), communicate(), and everything else I've been able to find on this site and many others.
Sometimes I've had errors such as FileNotFoundError: [WinError 2] The system cannot find the file specified though I have been able to fix these relatively swiftly, however when the code does actually work, and I try to print the output of the command, either it returns nothing (as in, it prints blank space), or it prints (b'', b'') or (b'', None).
decode() doesn't work, encoding doesn't change anything and I even tried:
subpr = str(process)
which, of course, did nothing.
How do you get the output of a shell command, as a string?
Other attempts:
subpr = (Popen(commandRun,shell=True,stdout=PIPE,stderr=PIPE,universal_newlines=True).communicate()[0])
process = subprocess.getoutput(commandRun)
process = subprocess.check_output(commandRun,shell=True)
process = subprocess.check_output(commandRun,stdout=PIPE,shell=True)
process = Popen(commandRun,stdout=PIPE,shell=True)
subpr = process.communicate()[0]
output = Popen(commandRun,shell=True,stdout=PIPE,stderr=PIPE)
subpr = output.communicate()
Imported:
import subprocess
from subprocess import Popen, PIPE
There is not much more code to add. I haven't written anything regarding subprocess other than that one broken line.

How are you trying to use these?
I have the following code that works, redirecting STDERR to STDOUT, because I wanted to have them merged:
import subprocess
args = ["whoami"]
run = subprocess.run(args, text=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
print(run.stdout)
If you want to pipe processes together, the best way is probably to put the popes in arguments of Popen, see https://docs.python.org/3/library/subprocess.html#replacing-bin-sh-shell-command-substitution
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]

Reading only file names from s3 using python [duplicate]

How do I execute the following shell command using the Python subprocess module?
echo "input data" | awk -f script.awk | sort > outfile.txt
The input data will come from a string, so I don't actually need echo. I've got this far, can anyone explain how I get it to pipe through sort too?
p_awk = subprocess.Popen(["awk","-f","script.awk"],
stdin=subprocess.PIPE,
stdout=file("outfile.txt", "w"))
p_awk.communicate( "input data" )
UPDATE: Note that while the accepted answer below doesn't actually answer the question as asked, I believe S.Lott is right and it's better to avoid having to solve that problem in the first place!

You'd be a little happier with the following.
import subprocess
awk_sort = subprocess.Popen( "awk -f script.awk | sort > outfile.txt",
stdin=subprocess.PIPE, shell=True )
awk_sort.communicate( b"input data\n" )
Delegate part of the work to the shell. Let it connect two processes with a pipeline.
You'd be a lot happier rewriting 'script.awk' into Python, eliminating awk and the pipeline.
Edit. Some of the reasons for suggesting that awk isn't helping.
[There are too many reasons to respond via comments.]
Awk is adding a step of no significant value. There's nothing unique about awk's processing that Python doesn't handle.
The pipelining from awk to sort, for large sets of data, may improve elapsed processing time. For short sets of data, it has no significant benefit. A quick measurement of awk >file ; sort file and awk | sort will reveal of concurrency helps. With sort, it rarely helps because sort is not a once-through filter.
The simplicity of "Python to sort" processing (instead of "Python to awk to sort") prevents the exact kind of questions being asked here.
Python -- while wordier than awk -- is also explicit where awk has certain implicit rules that are opaque to newbies, and confusing to non-specialists.
Awk (like the shell script itself) adds Yet Another Programming language. If all of this can be done in one language (Python), eliminating the shell and the awk programming eliminates two programming languages, allowing someone to focus on the value-producing parts of the task.
Bottom line: awk can't add significant value. In this case, awk is a net cost; it added enough complexity that it was necessary to ask this question. Removing awk will be a net gain.
Sidebar Why building a pipeline (a | b) is so hard.
When the shell is confronted with a | b it has to do the following.
Fork a child process of the original shell. This will eventually become b.
Build an os pipe. (not a Python subprocess.PIPE) but call os.pipe() which returns two new file descriptors that are connected via common buffer. At this point the process has stdin, stdout, stderr from its parent, plus a file that will be "a's stdout" and "b's stdin".
Fork a child. The child replaces its stdout with the new a's stdout. Exec the a process.
The b child closes replaces its stdin with the new b's stdin. Exec the b process.
The b child waits for a to complete.
The parent is waiting for b to complete.
I think that the above can be used recursively to spawn a | b | c, but you have to implicitly parenthesize long pipelines, treating them as if they're a | (b | c).
Since Python has os.pipe(), os.exec() and os.fork(), and you can replace sys.stdin and sys.stdout, there's a way to do the above in pure Python. Indeed, you may be able to work out some shortcuts using os.pipe() and subprocess.Popen.
However, it's easier to delegate that operation to the shell.

import subprocess
some_string = b'input_data'
sort_out = open('outfile.txt', 'wb', 0)
sort_in = subprocess.Popen('sort', stdin=subprocess.PIPE, stdout=sort_out).stdin
subprocess.Popen(['awk', '-f', 'script.awk'], stdout=sort_in,
stdin=subprocess.PIPE).communicate(some_string)

To emulate a shell pipeline:
from subprocess import check_call
check_call('echo "input data" | a | b > outfile.txt', shell=True)
without invoking the shell (see 17.1.4.2. Replacing shell pipeline):
#!/usr/bin/env python
from subprocess import Popen, PIPE
a = Popen(["a"], stdin=PIPE, stdout=PIPE)
with a.stdin:
with a.stdout, open("outfile.txt", "wb") as outfile:
b = Popen(["b"], stdin=a.stdout, stdout=outfile)
a.stdin.write(b"input data")
statuses = [a.wait(), b.wait()] # both a.stdin/stdout are closed already
plumbum provides some syntax sugar:
#!/usr/bin/env python
from plumbum.cmd import a, b # magic
(a << "input data" | b > "outfile.txt")()
The analog of:
#!/bin/sh
echo "input data" | awk -f script.awk | sort > outfile.txt
is:
#!/usr/bin/env python
from plumbum.cmd import awk, sort
(awk["-f", "script.awk"] << "input data" | sort > "outfile.txt")()

The accepted answer is sidestepping actual question.
here is a snippet that chains the output of multiple processes:
Note that it also prints the (somewhat) equivalent shell command so you can run it and make sure the output is correct.
#!/usr/bin/env python3
from subprocess import Popen, PIPE
# cmd1 : dd if=/dev/zero bs=1m count=100
# cmd2 : gzip
# cmd3 : wc -c
cmd1 = ['dd', 'if=/dev/zero', 'bs=1M', 'count=100']
cmd2 = ['tee']
cmd3 = ['wc', '-c']
print(f"Shell style : {' '.join(cmd1)} | {' '.join(cmd2)} | {' '.join(cmd3)}")
p1 = Popen(cmd1, stdout=PIPE, stderr=PIPE) # stderr=PIPE optional, dd is chatty
p2 = Popen(cmd2, stdin=p1.stdout, stdout=PIPE)
p3 = Popen(cmd3, stdin=p2.stdout, stdout=PIPE)
print("Output from last process : " + (p3.communicate()[0]).decode())
# thoretically p1 and p2 may still be running, this ensures we are collecting their return codes
p1.wait()
p2.wait()
print("p1 return: ", p1.returncode)
print("p2 return: ", p2.returncode)
print("p3 return: ", p3.returncode)

http://www.python.org/doc/2.5.2/lib/node535.html covered this pretty well. Is there some part of this you didn't understand?
Your program would be pretty similar, but the second Popen would have stdout= to a file, and you wouldn't need the output of its .communicate().

Inspired by #Cristian's answer. I met just the same issue, but with a different command. So I'm putting my tested example, which I believe could be helpful:
grep_proc = subprocess.Popen(["grep", "rabbitmq"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
subprocess.Popen(["ps", "aux"], stdout=grep_proc.stdin)
out, err = grep_proc.communicate()
This is tested.
What has been done
Declared lazy grep execution with stdin from pipe. This command will be executed at the ps command execution when the pipe will be filled with the stdout of ps.
Called the primary command ps with stdout directed to the pipe used by the grep command.
Grep communicated to get stdout from the pipe.
I like this way because it is natural pipe conception gently wrapped with subprocess interfaces.

The previous answers missed an important point. Replacing shell pipeline is basically correct, as pointed out by geocar. It is almost sufficient to run communicate on the last element of the pipe.
The remaining problem is passing the input data to the pipeline. With multiple subprocesses, a simple communicate(input_data) on the last element doesn't work - it hangs forever. You need to create a a pipeline and a child manually like this:
import os
import subprocess
input = """\
input data
more input
""" * 10
rd, wr = os.pipe()
if os.fork() != 0: # parent
os.close(wr)
else: # child
os.close(rd)
os.write(wr, input)
os.close(wr)
exit()
p_awk = subprocess.Popen(["awk", "{ print $2; }"],
stdin=rd,
stdout=subprocess.PIPE)
p_sort = subprocess.Popen(["sort"],
stdin=p_awk.stdout,
stdout=subprocess.PIPE)
p_awk.stdout.close()
out, err = p_sort.communicate()
print (out.rstrip())
Now the child provides the input through the pipe, and the parent calls communicate(), which works as expected. With this approach, you can create arbitrary long pipelines without resorting to "delegating part of the work to the shell". Unfortunately the subprocess documentation doesn't mention this.
There are ways to achieve the same effect without pipes:
from tempfile import TemporaryFile
tf = TemporaryFile()
tf.write(input)
tf.seek(0, 0)
Now use stdin=tf for p_awk. It's a matter of taste what you prefer.
The above is still not 100% equivalent to bash pipelines because the signal handling is different. You can see this if you add another pipe element that truncates the output of sort, e.g. head -n 10. With the code above, sort will print a "Broken pipe" error message to stderr. You won't see this message when you run the same pipeline in the shell. (That's the only difference though, the result in stdout is the same). The reason seems to be that python's Popen sets SIG_IGN for SIGPIPE, whereas the shell leaves it at SIG_DFL, and sort's signal handling is different in these two cases.

EDIT: pipes is available on Windows but, crucially, doesn't appear to actually work on Windows. See comments below.
The Python standard library now includes the pipes module for handling this:
https://docs.python.org/2/library/pipes.html, https://docs.python.org/3.4/library/pipes.html
I'm not sure how long this module has been around, but this approach appears to be vastly simpler than mucking about with subprocess.

For me, the below approach is the cleanest and easiest to read
from subprocess import Popen, PIPE
def string_to_2_procs_to_file(input_s, first_cmd, second_cmd, output_filename):
with open(output_filename, 'wb') as out_f:
p2 = Popen(second_cmd, stdin=PIPE, stdout=out_f)
p1 = Popen(first_cmd, stdout=p2.stdin, stdin=PIPE)
p1.communicate(input=bytes(input_s))
p1.wait()
p2.stdin.close()
p2.wait()
which can be called like so:
string_to_2_procs_to_file('input data', ['awk', '-f', 'script.awk'], ['sort'], 'output.txt')

Python run bash script with flags and pipes

I have a bash script that returns the admin email for a domain, like the following.
whois -h $(whois "stackoverflow.com" | grep 'Registrar WHOIS Server:' | cut -f2- -d:) "stackoverflow.com" | grep 'Admin Email:' | cut -f2- -d:
I want to run this in a python file. I believe I need to use a subprocess but can't seem to get it working with the pipes and flags. Any help?

Yes, you can use subprocess with pipe.
i will ilustrate an exemple:
ps = subprocess.Popen(('whois', 'stackoverflow.com'), stdout=subprocess.PIPE)
output = subprocess.check_output(('grep', 'Registrar WHOIS'), stdin=ps.stdout)
ps.wait()
You can ajust as your's need

The easiest solution is to write the commands into a script file and execute that file.
If you don't want that, you can execute any command with
bash -c 'command'

This is covered in the Replacing Older Functions with the subprocess Module section of the docs.
The example there is this bash pipeline:
output=`dmesg | grep hda`
rewritten for subprocess as;
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
Note that in many cases, you don't need to handle all of the same edge cases that the shell handles in exactly the same way. But if you don't know what you need, it's better to be fully general like this.
Your $() does the same thing as the backticks in that example, your pipes are the same as the example's pipes, and your arguments aren't anything special.
So:
whois = Popen(['whois', 'stackoverflow.com'], stdout=PIPE)
grep = Popen(['grep', 'Registrar WHOIS Server:'], stdin=whois.stdout, stdout=PIPE)
whois.stdout.close()
cut = Popen(['cut', '-f2-', '-d:'], stdin=grep.stdout, stdout=PIPE)
grep.stdout.close()
inneroutput, _ = cut.communicate()
whois = Popen(['whois', '-h', inneroutput, 'stackoverflow.com'], stdout=PIPE)
grep = Popen(['grep', 'Admin Email:', stdin=whois.stdout, stdout=PIPE)
whois.stdout.close()
cut = Popen(['cut', '-f2-', '-d:'], stdin=grep.stdout)
grep.stdout.close()
cut.communicate()
If this seems like a mess, consider that:
Your original shell command is a mess.
If you actually know exactly what you're expecting the pipeline to do, you can skip a lot of it.
All of the stuff you're doing here could just be done directly in Python without the need for this whole mess.
You may be happier using a third-party library like plumbum.
How could you write the whole thing in Python without all this piping? For example, instead of using grep, you could use Python's re module. Or, since you're not even using a regular expression at all, just a simple in check. And likewise for cut:
whois = subprocess.run(['whois', 'stackoverflow.com'],
check=True, stdout=PIPE, encoding='utf-8').output
for line in whois.splitlines():
if 'Registrar WHOIS Server:' in line:
registrar = line.split(':', 1)[1]
break
whois = subprocess.run(['whois', '-h', registrar, 'stackoverflow.com'],
check=True, stdout=PIPE, encoding='utf-8').output
for line in inner.splitlines():
if 'Admin Email:' in line:
admin = line.split(':', 1)[1]
break

sort and uniq in python

I want to do some shell command in python. I have a main.py, which call successive function and I find some of them easier to do in shell. The problem : I want to do all of this automatically !
I want to do this kind of code :
sort fileIn | uniq > fileOut
my problem is to do it with the pipe caracter. I try :
from subprocess import call
call(['sort ',FileOut,'|',' uniq '])
or
p1 = subprocess.Popen(['sort ', FileOut], stdout=subprocess.PIPE)
p2 = subprocess.Popen([" wc","-l"], stdin=p1.stdout, stdout=subprocess.PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output,err = p2.communicate()
But all of this didn't work.
(NB: FileOut is a string)

You need to use shell=True, which causes your command to be run by the shell, instead of using a exec syscall:
call('sort {0} | uniq'.format(FileOut), shell=True)
It's worth noting that, if you simply want unique lines of a file in python (in no particular order), it may be easier to do so without the shell:
unique_lines= set(open('filename').readlines())

I got tired of always looking up the Popen module documentation so this is an abridged version of the utility function I use to wrap Popen. you can take the so parameter of the first call and pass it as the input to the next call. You can also do error checking/parsing if you need to.
def run(command, input=None)
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE, shell=True)
if input:
so, se = process.communicate(input)
else:
so, se = process.communicate()
rc = process.returncode
return so, se, rc

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.