I have an older python 2.7.5 script which suddenly makes problems on Red Hat Enterprise Linux Server release 7.6 (Maipo). After all I see, it runs fine on Red Hat Enterprise Linux Server release 7.4 (Maipo).
The script basically implements something like
cat /proc/cpuinfo | grep -m 1 -i 'cpu MHz'
by creating two subrocesses and piping the output of the first into the second (see code example below). On the newer OS version, the cat processes stay open until the script terminates.
It seems, that the pipe to grep somehow holds the cat-process open and I can't find any documentation on how to explicitely close it.
The issue can be reproduced by pasting this code into the python CLI and then checking the ps process list for a static process 'cat /proc/cpuinfo'.
The code is breaking down what's originally happening inside a loop, so please don't argue about its style. ;-)
import shlex
from subprocess import *
cmd1 = "cat /proc/cpuinfo"
cmd2 = "grep -m 1 -i 'cpu MHz'"
args1 = shlex.split(cmd1) # split into args
args2 = shlex.split(cmd2) # split into args
# first process uses default stdin
ps1 = Popen(args1, stdout=PIPE)
# then use the output of the previous process as stdin
ps2 = Popen(args2, stdin=ps1.stdout, stdout=PIPE)
out, err = ps2.communicate()
print(out)
Afterwards check the process list in a second session(!) with:
ps -eF |grep -v grep|grep /proc/cpuinfo
On RHEL7.4 I find no open process in the process list, whereas on RHEL 7.6 after some attempts it looks like this:
[reinski#myhost ~]$ ps -eF |grep -v grep|grep /proc/cpuinfo
reinski 2422 89459 0 26993 356 142 18:46 pts/3 00:00:00 cat /proc/cpuinfo
reinski 2597 139605 0 26993 352 31 18:39 pts/3 00:00:00 cat /proc/cpuinfo
reinski 7809 139605 0 26993 352 86 18:03 pts/3 00:00:00 cat /proc/cpuinfo
These processes will only dissappear when I close the python CLI, in which case I get errors like this (I left the formatting messed up as it was):
cat: write error: Broken pipe
cat: write errorcat: write error: Broken pipe
: Broken pipe
Why is cat obviously still wanting to write to the pipe, even though it should have already output the whole /proc/cpuinfo and should have terminated itself?
Or more important: How can I prevent this from happening?
Thanks for any help!
Example 2:
Given the suggestion from VPfB it turned out, that my example was a little unlucky, since the expected result can be achieved by a single grep command.
So here is a modified example to show the problem with piping in another way:
import shlex
from subprocess import *
cmd1 = "grep -m 1 -i 'cpu MHz' /proc/cpuinfo"
cmd2 = "awk '{print $4}'"
args1 = shlex.split(cmd1) # split into args
args2 = shlex.split(cmd2) # split into args
# first process uses default stdin
ps1 = Popen(args1, stdout=PIPE)
# then use the output of the previous process as stdin
ps2 = Popen(args2, stdin=ps1.stdout, stdout=PIPE)
out, err = ps2.communicate()
print(out)
This time, the result is a single zombie process for the grep process (169731 is the pid of the python session):
[reinski#myhost ~]$ ps -eF|grep 169731
reinski 169731 189499 0 37847 6024 198 17:51 pts/2 00:00:00 python
reinski 193999 169731 0 0 0 142 17:53 pts/2 00:00:00 [grep] <defunct>
So, is this just another symptom of the same problem or am I doing something completely wrong here?
Ok, it seems I just found a solution for the zombie processes staying open from the examples:
Simply need to do a
ps1.communicate()
It seems, this is required to close the pipe properly.
I'd expect this to happen when the second process's communicate() is called and it reads the pipe from the first process.
Can someone maybe point out to me, what I am missing here?
I am always willing to learn... ;-)
How do I execute the following shell command using the Python subprocess module?
echo "input data" | awk -f script.awk | sort > outfile.txt
The input data will come from a string, so I don't actually need echo. I've got this far, can anyone explain how I get it to pipe through sort too?
p_awk = subprocess.Popen(["awk","-f","script.awk"],
stdin=subprocess.PIPE,
stdout=file("outfile.txt", "w"))
p_awk.communicate( "input data" )
UPDATE: Note that while the accepted answer below doesn't actually answer the question as asked, I believe S.Lott is right and it's better to avoid having to solve that problem in the first place!
You'd be a little happier with the following.
import subprocess
awk_sort = subprocess.Popen( "awk -f script.awk | sort > outfile.txt",
stdin=subprocess.PIPE, shell=True )
awk_sort.communicate( b"input data\n" )
Delegate part of the work to the shell. Let it connect two processes with a pipeline.
You'd be a lot happier rewriting 'script.awk' into Python, eliminating awk and the pipeline.
Edit. Some of the reasons for suggesting that awk isn't helping.
[There are too many reasons to respond via comments.]
Awk is adding a step of no significant value. There's nothing unique about awk's processing that Python doesn't handle.
The pipelining from awk to sort, for large sets of data, may improve elapsed processing time. For short sets of data, it has no significant benefit. A quick measurement of awk >file ; sort file and awk | sort will reveal of concurrency helps. With sort, it rarely helps because sort is not a once-through filter.
The simplicity of "Python to sort" processing (instead of "Python to awk to sort") prevents the exact kind of questions being asked here.
Python -- while wordier than awk -- is also explicit where awk has certain implicit rules that are opaque to newbies, and confusing to non-specialists.
Awk (like the shell script itself) adds Yet Another Programming language. If all of this can be done in one language (Python), eliminating the shell and the awk programming eliminates two programming languages, allowing someone to focus on the value-producing parts of the task.
Bottom line: awk can't add significant value. In this case, awk is a net cost; it added enough complexity that it was necessary to ask this question. Removing awk will be a net gain.
Sidebar Why building a pipeline (a | b) is so hard.
When the shell is confronted with a | b it has to do the following.
Fork a child process of the original shell. This will eventually become b.
Build an os pipe. (not a Python subprocess.PIPE) but call os.pipe() which returns two new file descriptors that are connected via common buffer. At this point the process has stdin, stdout, stderr from its parent, plus a file that will be "a's stdout" and "b's stdin".
Fork a child. The child replaces its stdout with the new a's stdout. Exec the a process.
The b child closes replaces its stdin with the new b's stdin. Exec the b process.
The b child waits for a to complete.
The parent is waiting for b to complete.
I think that the above can be used recursively to spawn a | b | c, but you have to implicitly parenthesize long pipelines, treating them as if they're a | (b | c).
Since Python has os.pipe(), os.exec() and os.fork(), and you can replace sys.stdin and sys.stdout, there's a way to do the above in pure Python. Indeed, you may be able to work out some shortcuts using os.pipe() and subprocess.Popen.
However, it's easier to delegate that operation to the shell.
import subprocess
some_string = b'input_data'
sort_out = open('outfile.txt', 'wb', 0)
sort_in = subprocess.Popen('sort', stdin=subprocess.PIPE, stdout=sort_out).stdin
subprocess.Popen(['awk', '-f', 'script.awk'], stdout=sort_in,
stdin=subprocess.PIPE).communicate(some_string)
To emulate a shell pipeline:
from subprocess import check_call
check_call('echo "input data" | a | b > outfile.txt', shell=True)
without invoking the shell (see 17.1.4.2. Replacing shell pipeline):
#!/usr/bin/env python
from subprocess import Popen, PIPE
a = Popen(["a"], stdin=PIPE, stdout=PIPE)
with a.stdin:
with a.stdout, open("outfile.txt", "wb") as outfile:
b = Popen(["b"], stdin=a.stdout, stdout=outfile)
a.stdin.write(b"input data")
statuses = [a.wait(), b.wait()] # both a.stdin/stdout are closed already
plumbum provides some syntax sugar:
#!/usr/bin/env python
from plumbum.cmd import a, b # magic
(a << "input data" | b > "outfile.txt")()
The analog of:
#!/bin/sh
echo "input data" | awk -f script.awk | sort > outfile.txt
is:
#!/usr/bin/env python
from plumbum.cmd import awk, sort
(awk["-f", "script.awk"] << "input data" | sort > "outfile.txt")()
The accepted answer is sidestepping actual question.
here is a snippet that chains the output of multiple processes:
Note that it also prints the (somewhat) equivalent shell command so you can run it and make sure the output is correct.
#!/usr/bin/env python3
from subprocess import Popen, PIPE
# cmd1 : dd if=/dev/zero bs=1m count=100
# cmd2 : gzip
# cmd3 : wc -c
cmd1 = ['dd', 'if=/dev/zero', 'bs=1M', 'count=100']
cmd2 = ['tee']
cmd3 = ['wc', '-c']
print(f"Shell style : {' '.join(cmd1)} | {' '.join(cmd2)} | {' '.join(cmd3)}")
p1 = Popen(cmd1, stdout=PIPE, stderr=PIPE) # stderr=PIPE optional, dd is chatty
p2 = Popen(cmd2, stdin=p1.stdout, stdout=PIPE)
p3 = Popen(cmd3, stdin=p2.stdout, stdout=PIPE)
print("Output from last process : " + (p3.communicate()[0]).decode())
# thoretically p1 and p2 may still be running, this ensures we are collecting their return codes
p1.wait()
p2.wait()
print("p1 return: ", p1.returncode)
print("p2 return: ", p2.returncode)
print("p3 return: ", p3.returncode)
http://www.python.org/doc/2.5.2/lib/node535.html covered this pretty well. Is there some part of this you didn't understand?
Your program would be pretty similar, but the second Popen would have stdout= to a file, and you wouldn't need the output of its .communicate().
Inspired by #Cristian's answer. I met just the same issue, but with a different command. So I'm putting my tested example, which I believe could be helpful:
grep_proc = subprocess.Popen(["grep", "rabbitmq"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
subprocess.Popen(["ps", "aux"], stdout=grep_proc.stdin)
out, err = grep_proc.communicate()
This is tested.
What has been done
Declared lazy grep execution with stdin from pipe. This command will be executed at the ps command execution when the pipe will be filled with the stdout of ps.
Called the primary command ps with stdout directed to the pipe used by the grep command.
Grep communicated to get stdout from the pipe.
I like this way because it is natural pipe conception gently wrapped with subprocess interfaces.
The previous answers missed an important point. Replacing shell pipeline is basically correct, as pointed out by geocar. It is almost sufficient to run communicate on the last element of the pipe.
The remaining problem is passing the input data to the pipeline. With multiple subprocesses, a simple communicate(input_data) on the last element doesn't work - it hangs forever. You need to create a a pipeline and a child manually like this:
import os
import subprocess
input = """\
input data
more input
""" * 10
rd, wr = os.pipe()
if os.fork() != 0: # parent
os.close(wr)
else: # child
os.close(rd)
os.write(wr, input)
os.close(wr)
exit()
p_awk = subprocess.Popen(["awk", "{ print $2; }"],
stdin=rd,
stdout=subprocess.PIPE)
p_sort = subprocess.Popen(["sort"],
stdin=p_awk.stdout,
stdout=subprocess.PIPE)
p_awk.stdout.close()
out, err = p_sort.communicate()
print (out.rstrip())
Now the child provides the input through the pipe, and the parent calls communicate(), which works as expected. With this approach, you can create arbitrary long pipelines without resorting to "delegating part of the work to the shell". Unfortunately the subprocess documentation doesn't mention this.
There are ways to achieve the same effect without pipes:
from tempfile import TemporaryFile
tf = TemporaryFile()
tf.write(input)
tf.seek(0, 0)
Now use stdin=tf for p_awk. It's a matter of taste what you prefer.
The above is still not 100% equivalent to bash pipelines because the signal handling is different. You can see this if you add another pipe element that truncates the output of sort, e.g. head -n 10. With the code above, sort will print a "Broken pipe" error message to stderr. You won't see this message when you run the same pipeline in the shell. (That's the only difference though, the result in stdout is the same). The reason seems to be that python's Popen sets SIG_IGN for SIGPIPE, whereas the shell leaves it at SIG_DFL, and sort's signal handling is different in these two cases.
EDIT: pipes is available on Windows but, crucially, doesn't appear to actually work on Windows. See comments below.
The Python standard library now includes the pipes module for handling this:
https://docs.python.org/2/library/pipes.html, https://docs.python.org/3.4/library/pipes.html
I'm not sure how long this module has been around, but this approach appears to be vastly simpler than mucking about with subprocess.
For me, the below approach is the cleanest and easiest to read
from subprocess import Popen, PIPE
def string_to_2_procs_to_file(input_s, first_cmd, second_cmd, output_filename):
with open(output_filename, 'wb') as out_f:
p2 = Popen(second_cmd, stdin=PIPE, stdout=out_f)
p1 = Popen(first_cmd, stdout=p2.stdin, stdin=PIPE)
p1.communicate(input=bytes(input_s))
p1.wait()
p2.stdin.close()
p2.wait()
which can be called like so:
string_to_2_procs_to_file('input data', ['awk', '-f', 'script.awk'], ['sort'], 'output.txt')
This question already has answers here:
How do I use subprocess.Popen to connect multiple processes by pipes?
(9 answers)
Closed 7 years ago.
I want to run this command using call subprocess
ls -l folder | wc -l
My code in Python file is here:
subprocess.call(["ls","-l","folder","|","wc","-l"])
I got an error message like this:
ls: cannot access |: No such file or directory
ls: cannot access wc: No such file or directory
It's like command |wc can't be read by call subprocess.
How can i fix it?
Try out the shell option using a string as first parameter:
subprocess.call("ls -l folder | wc -l",shell=True)
Although this work, note that using shell=True is not recommended since it can introduce a security issue through shell injection.
You can setup a command pipeline by connecting one process's stdout with another's stdin. In your example, errors and the final output are written to the screen, so I didn't try to redirect them. This is generally preferable to something like communicate because instead of waiting for one program to complete before starting another (and encouring the expense of moving the data into the parent) they run in parallel.
import subprocess
p1 = subprocess.Popen(["ls","-l"], stdout=subprocess.PIPE)
p2 = subprocess.Popen(["wc","-l"], stdin=p1.stdout)
# close pipe in parent, its still open in children
p1.stdout.close()
p2.wait()
p1.wait()
You'll need to implement the piping logic yourself to make it work properly.
def piped_call(prog1, prog2):
out, err = subprocess.call(prog1).communicate()
if err:
print(err)
return None
else:
return subprocess.call(prog2).communicate(out)
You could try using subprocess.PIPE, assuming you wanted to avoid using subprocess.call(..., shell=True).
import subprocess
# Run 'ls', sending output to a PIPE (shell equiv.: ls -l | ... )
ls = subprocess.Popen('ls -l folder'.split(),
stdout=subprocess.PIPE)
# Read output from 'ls' as input to 'wc' (shell equiv.: ... | wc -l)
wc = subprocess.Popen('wc -l'.split(),
stdin=ls.stdout,
stdout=subprocess.PIPE)
# Trap stdout and stderr from 'wc'
out, err = wc.communicate()
if err:
print(err.strip())
if out:
print(out.strip())
For Python 3 keep in mind the communicate() method used here will return a byte object instead of a string. :
In this case you will need to convert the output to a string using decode():
if err:
print(err.strip().decode())
if out:
print(out.strip().decode())
I've got a script parent.py trying to to read stdout from a subprocess sub.py in Python.
The parent parent.py:
#!/usr/bin/python
import subprocess
p = subprocess.Popen("sub.py", stdout=subprocess.PIPE)
print p.stdout.read(1)
And the subprocess, sub.py:
#!/usr/bin/python
print raw_input( "hello world!" )
I would expect running parent.py to print the 'h' from "hello world!". Actually, it hangs. I can only get my expected behaviour by adding -u to sub.py's she-bang line.
This confuses me because the -u switch makes no difference when sub.py is run directly from a shell; the shell is somehow privy to the un-flushed output stream, unlike parent.py.
My goal is to run a C program as the subprocess, so I won't be able to control whether or not it flushes stdout. How is it that a shell has better access to a process's stdout than Python running the same thing from subprocess.Popen? Am I going to be able to read such a stdout stream from a C program that doesn't flush its buffers?
EDIT:
Here is an updated example based on korylprince's comment...
## capitalize.sh ##
#!/bin/sh
while [ 1 ]; do
read s
echo $s | tr '[:lower:]' '[:upper:]'
done
########################################
## parent.py ##
#!/usr/bin/python
from subprocess import Popen, PIPE
# cmd = [ 'capitalize.sh' ] # This would work
cmd = [ 'script', '-q', '-f', '-c', 'capitalize.sh', '/dev/null']
p = Popen(cmd, stdin=PIPE)
p.stdin.write("some string\n")
p.wait()
When running through script, I get steady printing of newlines (and if this were a Python, subprocess, it'd raise an EOFerror).
An alternative is
p = subprocess.Popen(["python", "-u", "sub.py"], stdout=subprocess.PIPE)
or the suggestions here.
My experience is that yes, you will be able to read from most C programs without any extra effort.
The Python interpreter takes extra steps to buffer its output which is why it needs the -u switch to disable output buffering. Your typical C program won't do this.
I haven't run into any program (C or otherwise) other than the Python interpreter that I expected to work and didn't within a subshell.
The reason the shell can read output immediately, regardless of "-u" is because the program you're launching from the shell has its output connected to a TTY. When the stdout is connected to a TTY, it is unbuffered (because it is up to the TTY to buffer). When you launch the python subprocess from within python, you're connecting stdout to a pipe, which means you're at the mercy of the subprocess to flush its output when it feels like it.
If you're looking to do complicated interactions with a subprocess, look into this tutorial.
I want to pipe a python script's output to a bash script. What i did so far was i tried to use os.popen(), sys.subprocess(), and tried to give a pipe for an example
os.popen('echo "P 1 1 591336 4927369 1 321 " | v.in.ascii -zn out=abcx format=standard --overwrite')
but this didn't work, the values "591336" and "4927369" are the variables which comes as the output of the python script. but when I do this or change the values manually by repeating the echo command and the pipe, it works (in bash).
v.in.ascii -zn out=abcx format=standard --overwrite
the above part of the bash command is a part of Grass GIS
Can anyone help me!
You can just use print to output to stdout and pipe the Python process to the next process, e.g.
python myprogram.py | ...
Where myprogram.py might look like:
for x in something:
print dosomething(x)
This works for me:
>>> stdin, stdout = os.popen2("echo %s | grep 'test'" % 'some test param')
>>> print stdout.read()
some test param
>>>
As of Python 2.6, the subprocess module is recommended instead of the deprecated os.popen. Here's an example:
from subprocess import Popen, PIPE
p = Popen(["v.in.ascii", "-zn", "out=abcx", "format=standard", "--overwrite"], stdin=PIPE)
p.stdin.write("P 1 1 591336 4927369 1 321\n")
p.stdin.close()
p.wait() # unless background execution preferred
I really like John Paulett's answer.
I think your echo example would work if you used os.system instead of os.popen.
One way to use popen here is like this:
f = os.popen("v.in.ascii -zn out=abcx format=standard --overwrite", 'w')
f.write("P 1 1 591336 4927369 1 321\n")
f.close()
(You have to specify the pipe is for writing.)