subprocess.Popen() sends awk and grep lines differently than expected - python

On a CentOS 7.2 I have a file called cpuload, which contains the latest CPU load data in the following format:
last 30 sec:
average load: 0
cpu0 total load: 0
cpu1 total load: 0
cpu2 total load: 0
cpu3 total load: 1
cpu4 total load: 0
cpu5 total load: 0
cpu6 total load: 0
cpu7 total load: 0
last sec:
average load: 1
cpu0 total load: 5
cpu1 total load: 1
cpu2 total load: 1
cpu3 total load: 3
cpu4 total load: 2
cpu5 total load: 1
cpu6 total load: 0
cpu7 total load: 0
I want to get the number after the "average load:" of the "last sec" bit.
Two cli commands give me that information when I run them as shell commands on the terminal:
grep 'average load:' cpuload | sed -n 's/.*load: //p' | tail -n1
and
awk 'NR > 2 && /average load:/ {print $3}' cpuload
But when I run them in subprocess.Popen() with Shell=True I only get stderr:
for:
import subprocess
cmd = ["grep", "'average load:'", "cpuload", "|", "sed", "-n", "'s/.*load: //p'", "|", "tail", "-n1"]
test = subprocess.Popen(cmd, stdout = subprocess.PIPE, stderr = subprocess.PIPE, shell=True)
test.stderr.read()
I get:
b"Usage: grep [OPTION]... PATTERN [FILE]...\nTry 'grep --help' for more information.\n"
and for:
import subprocess
cmd = cmd = ["awk", "'NR > 2 && /average load:/ {print $3}'", "cpuload"]
test = subprocess.Popen(cmd, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
test.stderr.read()
I also get:
b"awk: cmd. line:1: 'NR > 2 && /average load:/ {print $3}'\nawk: cmd. line:1: ^ invalid char ''' in expression\n"
Even though I avoided using a |
or if shell=True I get:
b"Usage: awk [POSIX or GNU style options] -f progfile [--] file ...\nUsage: awk [POSIX or GNU style options] [--] 'program' file ...\nPOSIX options:\t\tGNU long options: (standard)\n\t-f progfile\t\t--file=progfile\n\t-F fs\t\t\t--field-separator=fs\n\t-v var=val\t\t--assign=var=val\nShort options:\t\tGNU long options: (extensions)\n\t-b\t\t\t--characters-as-bytes\n\t-c\t\t\t--traditional\n\t-C\t\t\t--copyright\n\t-d[file]\t\t--dump-variables[=file]\n\t-e 'program-text'\t--source='program-text'\n\t-E file\t\t\t--exec=file\n\t-g\t\t\t--gen-pot\n\t-h\t\t\t--help\n\t-L [fatal]\t\t--lint[=fatal]\n\t-n\t\t\t--non-decimal-data\n\t-N\t\t\t--use-lc-numeric\n\t-O\t\t\t--optimize\n\t-p[file]\t\t--profile[=file]\n\t-P\t\t\t--posix\n\t-r\t\t\t--re-interval\n\t-S\t\t\t--sandbox\n\t-t\t\t\t--lint-old\n\t-V\t\t\t--version\n\nTo report bugs, see node `Bugs' in `gawk.info', which is\nsection `Reporting Problems and Bugs' in the printed version.\n\ngawk is a pattern scanning and processing language.\nBy default it reads standard input and writes standard output.\n\nExamples:\n\tgawk '{ sum += $1 }; END { print sum }' file\n\tgawk -F: '{ print $1 }' /etc/passwd\n"
What am I doing wrong?

I have a file called cpuload, which contains the latest CPU load data ...I want to get the number after the "average load:" of the "last sec" bit
why not just to use simple python code in order to get the value you are looking for?
with open('cpuload') as f:
lines = [l.strip() for l in f.readlines()]
got_it = False
for line in lines:
if got_it:
parts = line.split(':')
result = parts[-1].strip()
print(result)
break
if line == 'last sec:':
got_it = True
output
1

First case with grep, sed, tail... and pipes.
You need to use shell = True parameter for Popen method and a single string for the command. We need to put cotes around parameters:
import subprocess
cmd = "grep 'average load:' cpuload | sed -n 's/.*load: //p' | tail -n1"
output = ""
test = subprocess.Popen(cmd, stdout = subprocess.PIPE, stderr = subprocess.PIPE, shell = True)
while True:
output += test.stdout.readline().decode("utf-8")
if test.poll() is not None:
break
print("output=<%s>" % (output))
Second case, without pipe:
You don't need to use shell = True parameter for Popen method and a single string for the command. We don't put cotes around parameters:
import subprocess
cmd = ["/usr/bin/awk", "NR > 2 && /^average load:/ {print $3}", "cpuload"]
output = ""
test = subprocess.Popen(cmd, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
while True:
output += test.stdout.readline().decode("utf-8")
if test.poll() is not None:
break
print("output=<%s>" % (output))

The issue was with passing the awk parameter to subprocess with ' around them, as detailed here
Did not accept Ed Morton's comment as it did not specify what should have been done.

Related

Getting Python to print to the command line (the output of the graphviz command gvpr)

I'm trying to get Python to run this command, which runs fine from my command prompt:
ccomps -x rel_graph.dot | gvpr -c "N[nNodes($G)<5]{delete(0,$)}" | dot | gvpack | sfdp -Goverlap=prism | gvmap -e | gvpr "BEGIN{int m,w,e = 0} N[fontcolor=='blue']{m += 1} N[fontcolor=='green']{e += 1} N[fontcolor=='red']{w += 1} END{print(m); print(w); print(e);}"
In Python, I'm using:
temp = subprocess.Popen("""ccomps -x rel_graph.dot | gvpr -c \
"N[nNodes($G)<5]{delete(0,$)}" | dot | gvpack | sfdp -Goverlap=prism \
| gvmap -e | gvpr 'BEGIN{int m,w,e = 0} \
N[fontcolor=="blue"]{m += 1} \
N[fontcolor=="green"]{e += 1} \
N[fontcolor=="red"]{w += 1} \
END{print(m); print(w); print(e);}'
""", shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
...and then read/print lines from temp. The issue is that Python doesn't print the three last print statements (all are integers) to standard output, or at least I wasn't able to find it. The rest of the gvpr program works fine from Python.
Thanks!
After some more work I changed the BEGIN quotation marks to double, and all the internal arguments to single, and that seems to have solved the issue.
You can send the stdout \ stderr to files likes this -
from subprocess import Popen
std_out_file = "/tmp/stdout.log"
std_err_file = "/tmp/stderr.log"
command_to_execute = "<your-command>"
with open(std_out_file, "wb") as out, open(std_err_file, "wb") as err:
p = Popen(command_to_execute, shell=True, cwd=<folder>, stdout=out, stderr=err)
p.communicate()
Then you read the stdout \ stderr from the files, for example:
f = open(std_out_file, "r")
stdout = f.readlines()
f.close()
You can check the return code of the command, to check if you also need to print the stderr like this -
if p.returncode == 0:

Script returns output in IDLE not when executed at CLI

I have a script that runs a few external commands and returns their values. I am able to run the script within IDLE just fine, and I get the expected results.
When I execute the script in a Linux shell (Bash), it runs, but with no output.
The exit status is 0.
#!/usr/bin/python
import array,os,subprocess
def vsdIndex(vmVsd):
index = subprocess.call(["/root/indeces.sh", vmVsd], stdout=subprocess.PIPE, shell=TRUE).communicate()[0]
print index
return (firstSerial,lastSerial)
def main():
vsdArray = []
vsdProc = subprocess.Popen(["sc vsd show | tail -n +2 | grep -Ev 'iso|ISO|vfd' | awk '{{print $1}}'"], stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()[0]
while True:
line = vsdProc.stdout.readline()
if line != '':
vsdIndex(line)
print "VSD:", line.rstrip()
print firstSerial
print lastSerial
else:
break
If I simplify it, and run without the function, I have the same behaviour:
def main():
vsdArray = []
vsdProc = subprocess.Popen(["sc vsd show | tail -n +2 | grep -Ev 'iso|ISO|vfd' | awk '{{print $1}}'"], stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()[0]
while True:
line = vsdProc.stdout.readline()
if line != '':
print "VSD:", line.rstrip()
else:
break
You need to call your main() function. Here is a common way of doing that automatically when you run on the command line:
if __name__ == "__main__":
main()

Using grep in log file

How do i place grep in the below string. I seem to be not getting it right.
p = subprocess.Popen(["tail", "-10", "/datax/qantas/run/tomcat/logs/localhost_access_log.2016-02-29.txt" "|" "grep /checkout-qantas/int/price?"], stdout=subprocess.PIPE)
gives me
tail: cannot open /datax/qantas/run/tomcat/logs/localhost_access_log.2016-02-29.txt|grep /checkout-qantas/int/price?' for reading: No such file or directory
shell = True and putting your complete command in quote should help you-
p = subprocess.Popen('tail -10 /datax/qantas/run/tomcat/logs/localhost_access_log.2016-02-29.txt | grep /checkout-qantas/int/price?', stdout=subprocess.PIPE, shell = True)
You have used shell keyword pipe (|), so you need to set shell=True:
subprocess.Popen("tail -10 ... | grep ....", shell=True)
If you want to save the output:
out = subprocess.check_output("tail -10 ... | grep ....", shell=True)
Better not to use shell=True, use subprocess.PIPE instead e.g.:
>>> out = subprocess.Popen(['tail', '-10', '/var/log/syslog'], stdout=subprocess.PIPE)
>>> subprocess.check_call(['grep', 'UDP'], stdin=out.stdout)

Invoking perl script with variable input and file output as arguments from python

I have a perl script that can be executed from the console as follows:
perl perlscript.pl -i input.txt -o output.txt --append
I want to execute this script from my python code. I figured out that subprocess.Popen can be used to connect to perl and I can pass my arguments with it. But, I also want to pass a variable (made by splitting up a text file) in place of input.txt.
I have tried the following but it doesn't seem to work and gives an obvious TypeError in line 8:
import re, shlex, subprocess, StringIO
f=open('fulltext.txt','rb')
text= f.read()
l = re.split('\n\n',str(text))
intxt = StringIO.StringIO()
for i in range(len(l)):
intxt.write(l[i])
command_line='perl cnv_ltrfinder2gff.pl -i '+intxt+' -o output.gff --append'
args=shlex.split(command_line)
p = subprocess.Popen(args)
Is there any other work around for this?
EDIT: Here is a sample of the file fulltext.txt. Entries are separated by a line.
Predict protein Domains 0.021 second
>Sequence: seq1 Len:13143 [1] seq1 Len:13143 Location : 9 - 13124 Len: 13116 Strand:+ Score : 6 [LTR region similarity:0.959] Status : 11110110000 5'-LTR : 9 - 501 Len: 493 3'-LTR : 12633 - 13124 Len: 492 5'-TG : TG , TG 3'-CA : CA , CA TSR : NOT FOUND Sharpness: 1,1 Strand + : PBS : [14/20] 524 - 543 (LysTTT) PPT : [12/15] 12553 - 12567
Predict protein Domains 0.019 second
>Sequence: seq5 Len:11539 [1] seq5 Len:11539 Location : 7 - 11535 Len: 11529 Strand:+ Score : 6 [LTR region similarity:0.984] Status : 11110110000 5'-LTR : 7 - 506 Len: 500 3'-LTR : 11036 - 11535 Len: 500 5'-TG : TG , TG 3'-CA : CA , CA TSR : NOT FOUND Sharpness: 1,1 Strand + : PBS : [15/22] 515 - 536 (LysTTT) PPT : [11/15] 11020 - 11034
I want to separate them and pass each entry block to the perl script. All the files are in the same directory.
you might be interested in the os module
and string formatting
Edit
I think I uderstand what you want now. correct me if I am wrong, but I think:
You want to split your fulltext.txt into blocks.
Every block contains a seq(number)
You want to run your perl script once for every block with as input file your seq(number)
if this is what you want, you could use the following code.
import os
in_file = 'fulltext.txt'
seq = []
with open(in_file,'r') as handle:
lines = handle.readlines()
for i in range(0,len(lines)):
if lines[i].startswith(">"):
seq.append(lines[i].rstrip().split(" ")[1])
for x in seq:
command = "perl perl cnv_ltrfinder2gff.pl -i %s.txt -o output.txt --append"%x
os.system(command)
The docs for --infile option:
Path of the input file. If an input file is not provided, the program
will expect input from STDIN.
You could omit --infile and pass input via a pipe (stdin) instead:
#!/usr/bin/env python
from subprocess import Popen, PIPE
with open('fulltext.txt') as file: # read input data
blocks = file.read().split('\n\n')
# run a separate perl process for each block
args = 'perl cnv_ltrfinder2gff.pl -o output.gff --append'.split()
for block in blocks:
p = Popen(args, stdin=PIPE, universal_newlines=True)
p.communicate(block)
if p.returncode != 0:
print('non-zero exit status: %s on block: %r' % (p.returncode, block))
You can run several perl scripts concurrently:
from multiprocessing.dummy import Pool # use threads
def run((i, block)):
filename = 'out%03d.gff' % i
args = ['perl', 'cnv_ltrfinder2gff.pl', '-o', filename]
p = Popen(args, stdin=PIPE, universal_newlines=True, close_fds=True)
p.communicate(block)
return p.returncode, filename
exit_statuses, filenames = zip(*Pool().map(run, enumerate(blocks, start=1)))
It runs several (equal to the number of CPUs on your system) child processes in parallel. You could specify a different number of worker threads (pass to Pool()).

Determine total size of SVN directory and store in python variable

this prints the directory size, but how can i save the output to a python variable, instead of print.
svn list -vR http://myIP/repos/test | awk '{sum+=$3; i++} END {print sum/1024000}'
but i need to store this print in a python variable;
proc = subprocess.Popen(svnproc, stdout=subprocess.PIPE, shell=True)
output = proc.stdout.read()
Print str(output)
nasty workaround is the push it out to a file and cat the file
svn list -vR http://myIP/repos/test | awk '{sum+=$3; i++} END {> /tmp/output.txt}'
From the fine docstring of "subprocess" I can read:
Replacing shell pipe line
output=dmesg | grep hda
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]
so that in your case I'd try the following code
switches = ...
directory = ...
p1 = Popen(["svn", "list", switches, directory], stdout=PIPE)
p2 = Popen(["awk", "{sum+=$3; i++} END {print sum/1024/1024}", stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0].strip()
ps: I have changed from sum/1024000 to sum/1024/1024 assuming that you want to count in megabytes
"
svnproc = "svn list -vR " + repoURL + " | awk '{sum+=$3; i++} END {print sum/1073741824}'"
proc = subprocess.Popen(svnproc, shell=True,
stdout=subprocess.PIPE)
svnbackupsize = float(proc.stdout.read())
The only problematic part of this script is that Popen does not wait till the process is DONE, however subprocess.call does wait till the process is completed.

Categories

Resources