why egrep's stdout did not go through pipe?

why egrep's stdout did not go through pipe? - python

i got a weird problem regarding egrep and pipe
I tried to filter a stream containing some lines who start with a topic name, such as
"TICK:this is a tick message\n"
When I try to use egrep to filter it :
./stream_generator | egrep 'TICK' | ./topic_processor
It seems that the topic_processor never receives any messages
However, when i use the following python script:
./stream_generator | python filter.py --topics TICK | ./topic_processor
everything looks to be fine.
I guess there need to be a 'flush' mechanism for egrep as well, is this correct?
Can anyone here give me a clue? Thanks a million
import sys
from optparse import OptionParser
if __name__ == '__main__':
parser = OptionParser()
parser.add_option("-m", "--topics",
action="store", type="string", dest="topics")
(opts, args) = parser.parse_args()
topics = opts.topics.split(':')
while True:
s = sys.stdin.readline()
for each in topics:
if s[0:4] == each:
sys.stdout.write(s)
sys.stdout.flush()

Have you allowed the command ./stream_generator | egrep 'TICK' | ./topic_processor to run to completion? If the command has completed without producing output then the problem does not lie with buffering since, upon the termination of ./stream_generator, egrep will flush any of its buffers and in turn terminate.
Now, it is true that egrep will use heavy buffering when not outputting directly to a terminal (i.e. when outputting to a pipe or file), and it may appear for a while that egrep produces no output if not enough data has accumulated in egrep's buffer to warrant a flush. This behaviour can be changed in GNU egrep by using the --line-buffered option:
./stream_generator | egrep --line-buffered 'TICK' | ./topic_processor

Related

Nagios python script wait for execution

I have this script which is run by nagios which checks the provider API if mitigation is enabled and reports back. I just copied an example from Nagios, I have no Python knowledge whatsoever. The problem is that sometimes the scripts needs 10 seconds to run and python just continues, so I need it to wait for execution.
I found some examples using subprocess which were successful but I don't know how to add the .readline and .strip to the command.
This is the original script:
#!/usr/bin/python
import os, sys
mitigation_enabled=os.popen("/usr/local/nagios/libexec/check_mitigation.py
| grep auto | awk '{print $2}'").readline().strip()
if mitigation_enabled == "false":
print "OK - Mitigation disabled." .format (mitigation_enabled)
sys.exit(0)
elif mitigation_enabled == "true":
print "WARNING - Mitigation enabled." .format (mitigation_enabled)
sys.exit(1)
else:
print "UKNOWN - mitigation status unknown." .format (mitigation_enabled)
sys.exit(2)
So how do I make this with subprocess, wait for the execution of the external script and add the .readline and .strip values?
Short question, how to make this work :)
Thanks!

You are complaining that the ancient historic API permits "short reads" of zero bytes.
Yes, that is correct, it is working as designed.
Recommend you use subprocess directly.
Also, two nits about awk '{print $2}':
Delete the grep by invoking awk '/auto/ {print $2}'.
Delete the overhead of the awk child by using split().

Reading only file names from s3 using python [duplicate]

How do I execute the following shell command using the Python subprocess module?
echo "input data" | awk -f script.awk | sort > outfile.txt
The input data will come from a string, so I don't actually need echo. I've got this far, can anyone explain how I get it to pipe through sort too?
p_awk = subprocess.Popen(["awk","-f","script.awk"],
stdin=subprocess.PIPE,
stdout=file("outfile.txt", "w"))
p_awk.communicate( "input data" )
UPDATE: Note that while the accepted answer below doesn't actually answer the question as asked, I believe S.Lott is right and it's better to avoid having to solve that problem in the first place!

You'd be a little happier with the following.
import subprocess
awk_sort = subprocess.Popen( "awk -f script.awk | sort > outfile.txt",
stdin=subprocess.PIPE, shell=True )
awk_sort.communicate( b"input data\n" )
Delegate part of the work to the shell. Let it connect two processes with a pipeline.
You'd be a lot happier rewriting 'script.awk' into Python, eliminating awk and the pipeline.
Edit. Some of the reasons for suggesting that awk isn't helping.
[There are too many reasons to respond via comments.]
Awk is adding a step of no significant value. There's nothing unique about awk's processing that Python doesn't handle.
The pipelining from awk to sort, for large sets of data, may improve elapsed processing time. For short sets of data, it has no significant benefit. A quick measurement of awk >file ; sort file and awk | sort will reveal of concurrency helps. With sort, it rarely helps because sort is not a once-through filter.
The simplicity of "Python to sort" processing (instead of "Python to awk to sort") prevents the exact kind of questions being asked here.
Python -- while wordier than awk -- is also explicit where awk has certain implicit rules that are opaque to newbies, and confusing to non-specialists.
Awk (like the shell script itself) adds Yet Another Programming language. If all of this can be done in one language (Python), eliminating the shell and the awk programming eliminates two programming languages, allowing someone to focus on the value-producing parts of the task.
Bottom line: awk can't add significant value. In this case, awk is a net cost; it added enough complexity that it was necessary to ask this question. Removing awk will be a net gain.
Sidebar Why building a pipeline (a | b) is so hard.
When the shell is confronted with a | b it has to do the following.
Fork a child process of the original shell. This will eventually become b.
Build an os pipe. (not a Python subprocess.PIPE) but call os.pipe() which returns two new file descriptors that are connected via common buffer. At this point the process has stdin, stdout, stderr from its parent, plus a file that will be "a's stdout" and "b's stdin".
Fork a child. The child replaces its stdout with the new a's stdout. Exec the a process.
The b child closes replaces its stdin with the new b's stdin. Exec the b process.
The b child waits for a to complete.
The parent is waiting for b to complete.
I think that the above can be used recursively to spawn a | b | c, but you have to implicitly parenthesize long pipelines, treating them as if they're a | (b | c).
Since Python has os.pipe(), os.exec() and os.fork(), and you can replace sys.stdin and sys.stdout, there's a way to do the above in pure Python. Indeed, you may be able to work out some shortcuts using os.pipe() and subprocess.Popen.
However, it's easier to delegate that operation to the shell.

import subprocess
some_string = b'input_data'
sort_out = open('outfile.txt', 'wb', 0)
sort_in = subprocess.Popen('sort', stdin=subprocess.PIPE, stdout=sort_out).stdin
subprocess.Popen(['awk', '-f', 'script.awk'], stdout=sort_in,
stdin=subprocess.PIPE).communicate(some_string)

To emulate a shell pipeline:
from subprocess import check_call
check_call('echo "input data" | a | b > outfile.txt', shell=True)
without invoking the shell (see 17.1.4.2. Replacing shell pipeline):
#!/usr/bin/env python
from subprocess import Popen, PIPE
a = Popen(["a"], stdin=PIPE, stdout=PIPE)
with a.stdin:
with a.stdout, open("outfile.txt", "wb") as outfile:
b = Popen(["b"], stdin=a.stdout, stdout=outfile)
a.stdin.write(b"input data")
statuses = [a.wait(), b.wait()] # both a.stdin/stdout are closed already
plumbum provides some syntax sugar:
#!/usr/bin/env python
from plumbum.cmd import a, b # magic
(a << "input data" | b > "outfile.txt")()
The analog of:
#!/bin/sh
echo "input data" | awk -f script.awk | sort > outfile.txt
is:
#!/usr/bin/env python
from plumbum.cmd import awk, sort
(awk["-f", "script.awk"] << "input data" | sort > "outfile.txt")()

The accepted answer is sidestepping actual question.
here is a snippet that chains the output of multiple processes:
Note that it also prints the (somewhat) equivalent shell command so you can run it and make sure the output is correct.
#!/usr/bin/env python3
from subprocess import Popen, PIPE
# cmd1 : dd if=/dev/zero bs=1m count=100
# cmd2 : gzip
# cmd3 : wc -c
cmd1 = ['dd', 'if=/dev/zero', 'bs=1M', 'count=100']
cmd2 = ['tee']
cmd3 = ['wc', '-c']
print(f"Shell style : {' '.join(cmd1)} | {' '.join(cmd2)} | {' '.join(cmd3)}")
p1 = Popen(cmd1, stdout=PIPE, stderr=PIPE) # stderr=PIPE optional, dd is chatty
p2 = Popen(cmd2, stdin=p1.stdout, stdout=PIPE)
p3 = Popen(cmd3, stdin=p2.stdout, stdout=PIPE)
print("Output from last process : " + (p3.communicate()[0]).decode())
# thoretically p1 and p2 may still be running, this ensures we are collecting their return codes
p1.wait()
p2.wait()
print("p1 return: ", p1.returncode)
print("p2 return: ", p2.returncode)
print("p3 return: ", p3.returncode)

http://www.python.org/doc/2.5.2/lib/node535.html covered this pretty well. Is there some part of this you didn't understand?
Your program would be pretty similar, but the second Popen would have stdout= to a file, and you wouldn't need the output of its .communicate().

Inspired by #Cristian's answer. I met just the same issue, but with a different command. So I'm putting my tested example, which I believe could be helpful:
grep_proc = subprocess.Popen(["grep", "rabbitmq"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
subprocess.Popen(["ps", "aux"], stdout=grep_proc.stdin)
out, err = grep_proc.communicate()
This is tested.
What has been done
Declared lazy grep execution with stdin from pipe. This command will be executed at the ps command execution when the pipe will be filled with the stdout of ps.
Called the primary command ps with stdout directed to the pipe used by the grep command.
Grep communicated to get stdout from the pipe.
I like this way because it is natural pipe conception gently wrapped with subprocess interfaces.

The previous answers missed an important point. Replacing shell pipeline is basically correct, as pointed out by geocar. It is almost sufficient to run communicate on the last element of the pipe.
The remaining problem is passing the input data to the pipeline. With multiple subprocesses, a simple communicate(input_data) on the last element doesn't work - it hangs forever. You need to create a a pipeline and a child manually like this:
import os
import subprocess
input = """\
input data
more input
""" * 10
rd, wr = os.pipe()
if os.fork() != 0: # parent
os.close(wr)
else: # child
os.close(rd)
os.write(wr, input)
os.close(wr)
exit()
p_awk = subprocess.Popen(["awk", "{ print $2; }"],
stdin=rd,
stdout=subprocess.PIPE)
p_sort = subprocess.Popen(["sort"],
stdin=p_awk.stdout,
stdout=subprocess.PIPE)
p_awk.stdout.close()
out, err = p_sort.communicate()
print (out.rstrip())
Now the child provides the input through the pipe, and the parent calls communicate(), which works as expected. With this approach, you can create arbitrary long pipelines without resorting to "delegating part of the work to the shell". Unfortunately the subprocess documentation doesn't mention this.
There are ways to achieve the same effect without pipes:
from tempfile import TemporaryFile
tf = TemporaryFile()
tf.write(input)
tf.seek(0, 0)
Now use stdin=tf for p_awk. It's a matter of taste what you prefer.
The above is still not 100% equivalent to bash pipelines because the signal handling is different. You can see this if you add another pipe element that truncates the output of sort, e.g. head -n 10. With the code above, sort will print a "Broken pipe" error message to stderr. You won't see this message when you run the same pipeline in the shell. (That's the only difference though, the result in stdout is the same). The reason seems to be that python's Popen sets SIG_IGN for SIGPIPE, whereas the shell leaves it at SIG_DFL, and sort's signal handling is different in these two cases.

EDIT: pipes is available on Windows but, crucially, doesn't appear to actually work on Windows. See comments below.
The Python standard library now includes the pipes module for handling this:
https://docs.python.org/2/library/pipes.html, https://docs.python.org/3.4/library/pipes.html
I'm not sure how long this module has been around, but this approach appears to be vastly simpler than mucking about with subprocess.

For me, the below approach is the cleanest and easiest to read
from subprocess import Popen, PIPE
def string_to_2_procs_to_file(input_s, first_cmd, second_cmd, output_filename):
with open(output_filename, 'wb') as out_f:
p2 = Popen(second_cmd, stdin=PIPE, stdout=out_f)
p1 = Popen(first_cmd, stdout=p2.stdin, stdin=PIPE)
p1.communicate(input=bytes(input_s))
p1.wait()
p2.stdin.close()
p2.wait()
which can be called like so:
string_to_2_procs_to_file('input data', ['awk', '-f', 'script.awk'], ['sort'], 'output.txt')

Call subprocess "ls -l folder | wc -l" in python can't be done [duplicate]

This question already has answers here:
How do I use subprocess.Popen to connect multiple processes by pipes?
(9 answers)
Closed 7 years ago.
I want to run this command using call subprocess
ls -l folder | wc -l
My code in Python file is here:
subprocess.call(["ls","-l","folder","|","wc","-l"])
I got an error message like this:
ls: cannot access |: No such file or directory
ls: cannot access wc: No such file or directory
It's like command |wc can't be read by call subprocess.
How can i fix it?

Try out the shell option using a string as first parameter:
subprocess.call("ls -l folder | wc -l",shell=True)
Although this work, note that using shell=True is not recommended since it can introduce a security issue through shell injection.

You can setup a command pipeline by connecting one process's stdout with another's stdin. In your example, errors and the final output are written to the screen, so I didn't try to redirect them. This is generally preferable to something like communicate because instead of waiting for one program to complete before starting another (and encouring the expense of moving the data into the parent) they run in parallel.
import subprocess
p1 = subprocess.Popen(["ls","-l"], stdout=subprocess.PIPE)
p2 = subprocess.Popen(["wc","-l"], stdin=p1.stdout)
# close pipe in parent, its still open in children
p1.stdout.close()
p2.wait()
p1.wait()

You'll need to implement the piping logic yourself to make it work properly.
def piped_call(prog1, prog2):
out, err = subprocess.call(prog1).communicate()
if err:
print(err)
return None
else:
return subprocess.call(prog2).communicate(out)

You could try using subprocess.PIPE, assuming you wanted to avoid using subprocess.call(..., shell=True).
import subprocess
# Run 'ls', sending output to a PIPE (shell equiv.: ls -l | ... )
ls = subprocess.Popen('ls -l folder'.split(),
stdout=subprocess.PIPE)
# Read output from 'ls' as input to 'wc' (shell equiv.: ... | wc -l)
wc = subprocess.Popen('wc -l'.split(),
stdin=ls.stdout,
stdout=subprocess.PIPE)
# Trap stdout and stderr from 'wc'
out, err = wc.communicate()
if err:
print(err.strip())
if out:
print(out.strip())
For Python 3 keep in mind the communicate() method used here will return a byte object instead of a string. :
In this case you will need to convert the output to a string using decode():
if err:
print(err.strip().decode())
if out:
print(out.strip().decode())

Get Video-Duration with avconv via Python

I need to get the duration of an Video for an application for Django. So I'll have to do this in python. But I'm really a beginner in this. So it would be nice, if you can help.
This is what I got so far:
import subprocess
task = subprocess.Popen("avconv -i video.mp4 2>&1 | grep Duration | cut -d ' ' -f 4 | sed -r 's/([^\.]*)\..*/\1/'", shell=True, stdout=subprocess.PIPE)
time = task.communicate()[0]
print time
I want to solve it with avconv because I'm allready using this at another point. The shell-command works well so far and gives me an output like:
HH:MM:SS.
But when I'm executing the python-code I just get an non-interpretable symbol on the shell.
Thanks a lot allready for your help!
Found a solution. Problem was the sed-part:
import os
import subprocess
task = subprocess.Popen("avconv -i video.mp4 2>&1 | grep Duration | cut -d ' ' -f 4 | sed -e 's/.\{4\}$//'", shell=True, stdout=subprocess.PIPE)
time = task.communicate()[0]
print time
Because it is allways the same part, it was enought to just cut the last 4 characters.

From python documentation:
Warning
Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.
So you should really user communicate for that:
import subprocess
task = subprocess.Popen("avconv -i video.mp4 2>&1 | grep Duration | cut -d ' ' -f 4 | sed -r 's/([^\.]*)\..*/\1/'", shell=True, stdout=subprocess.PIPE)
time = task.communicate()[0]
print time
That way you can also catch stderr message, if any.

How to get subprocess' stdout data asynchronously?

I wrote a simple python script for my application and predefined some fast commands like make etc.
I've written a function for running system commands (linux):
def runCommand(commandLine):
print('############## Running command: ' + commandLine)
p = subprocess.Popen(commandLine, shell = True, stdout = subprocess.PIPE)
print (p.stdout.read().decode('utf-8'))
Everything works well except a few things:
I'm using cmake and it's output is colored. Any chances to save colors in output?
I can look at output after process has finished. For example, make runs for a long period of time but I can see the output only after full compilation. How to do it asynchronously?

I'm not sure about colors, but here's how to poll the subprocess's stdout one line at a time:
import subprocess
proc = subprocess.Popen('cmake', shell=True, stdout=subprocess.PIPE)
while proc.poll() is None:
output = proc.stdout.readline()
print output
Don't forget to read from stderr as well, as I'm sure cmake will emit information there.

You're not getting color because cmake detects if its stdout is a terminal, if it's not it doesn't color its own output. Some programs give you an option to force coloring output. Unfortunately cmake does not, so you're out of luck there. Unless you want to patch cmake yourself.
Lots of programs do this, for example grep:
# grep test test.txt
test
^
|
|------- this word is red
Now pipe it to cat:
# grep test test.txt | cat
test
^
|
|------- no longer red
grep option --color=always to force color:
# grep test test.txt --color=always | cat
test
^
|
|------- red again

Regarding how to get the output of your process before it finishes, it should be possible to do that replacing:
p.stdout.read
with:
for line in p.stdout:
Regarding how to save colored output, there isn't anything special about that. For example, if the row output is saved to a file, then next time cat <logfile> is executed the console will interpret the escape sequences and displaye the colors as expected.

For CMake specifically, you can force color output using the option CLICOLOR_FORCE=1:
command = 'make CLICOLOR_FORCE=1'
args = shlex.split(command)
proc = subprocess.Popen(args, stdout=subprocess.PIPE)
Then print as in the accepted answer:
while proc.poll() is None:
output = proc.stdout.readline()
print(output.decode('utf-8'))
If you decode to utf-8 before printing, you should see colored output.
If you print the result as a byte literal (i.e. without decoding), you should see the escape sequences for the various colors.
Consider trying the option universal_newlines=True:
proc = subprocess.Popen(args, stdout=subprocess.PIPE, universal_newlines=True)
This causes the call to proc.stdout.readline() to return a string instead of a byte literal so you can/must skip the call to decode().

To do async output do something like: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/440554
Not sure if you can capture the coloured output. If you can get the escaped colour codes you may be able to.

Worth noting here is the use of the script command as a pseudo terminal and be detected as a tty instead of redirect(pipe) file descriptor, see:
bash command preserve color when piping
Works like a charm...
As per the example in the question simply let script execute cmake:
import subprocess
proc = subprocess.Popen('script cmake', shell=True, stdout=subprocess.PIPE)
while proc.poll() is None:
output = proc.stdout.readline()
print output
This tricks cmake into thinking it's being executed from a terminal and will produce the ANSI candy you're after.
nJoy!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.