Use awk in python script - python

I have to use awk print inside python script.
Below format I used
a = commands.getoutput(" ls -l | awk \'{print $1, $2}\' | awk \'{if(NR>3)print}\'"
I am getting below error:
KeyError: 'print $1, $2'
Can some one help me to fix.

Your Awk scripts can be simplified to a single script trivially. There is nothing here which really calls for using Awk anyway.
from subprocess import Popen, PIPE
with Popen(["ls", "-l"], stdout=PIPE, universal_newlines=True) as proc:
for index, line in enumerate(proc.stdout):
if index <= 2:
continue
print(' '.join(line.split()[0:2]))

Related

How to run advance Linux command on python script

I want to get the string output of the following linux command
systemctl show node_exporter |grep LoadState| awk '{split($0,a,"="); print a[2]}'
I tried with
import subprocess
output = subprocess.check_output("systemctl show node_exporter |grep LoadState| awk '{split($0,a,"="); print a[2]}'", shell=True)
but the output is,
output = subprocess.check_output("systemctl show node_exporter |grep LoadState| awk '{split($0,a,"="); print a[2]}'", shell=True)
SyntaxError: keyword can't be an expression
Well,
First of all, the function takes a list of strings as a command, not a single string. E.g.:
"ls -a -l" - wrong
["ls", "-a", "-l"] - good
Secondly. If the linux command is super complex or contains lots of lines - it makes sense to create a separate bash file e.g. command.sh, put your linux commands there and run the script from python with:
import subprocess
output = subprocess.check_output(["./command.sh"], shell=True)
You need to escape the double quotes (because they indicate the begin/end of the string):
import subprocess
output = subprocess.check_output("systemctl show node_exporter |grep LoadState| awk '{split($0,a,\"=\"); print a[2]}'", shell=True)

Bash commands in python [duplicate]

This question already has answers here:
How to use `subprocess` command with pipes
(7 answers)
Closed 4 years ago.
I am running a code in python which calculates the count of the files present in a directory`
hadoop fs -count /user/a909983/sample_data/ | awk '{print $2}'
This successfully returns 0 in the linux command line as the dir is empty.However when I run this in python script it returns 1.The line of code in python is:
directoryEmptyStatusCommand = subprocess.call(
["hadoop", "fs", "-count", "/user/a909983/sample_data/", "|", "awk '{print $2}'"])
How can I correct this? or what am I missing ?. I have also tried using Popen, but the result is the same.
Use subprocess.Popen and don't use the pipe | because it requires shell=True which security risk. So, use the subprocess.PIPE and use that with subprocess.check_output without pipe thats the correct method.
So, you can try something like:
command = subprocess.Popen(("hadoop", "fs", "-count", "/user/a909983/sample_data/") , stdout=subprocess.PIPE)
output = subprocess.check_output(("awk '{print $2}'"), stdin=command.stdout)
In Case You want to try Shell commands by enabling shell=True:
cmd = "hadoop fs -count /user/a909983/sample_data/ | awk '{print $2}'"
command = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
output = command.communicate()[0]
print(output)

Python subprocess awk command in a function, taking element from list as argument

I want to subprocess an awk command with a list element as argument.
With a single argument, and from a shell prompt, it's prety simple:
$ awk -F "," '/US/ && /00001/ {print $1","$3}' stock_inventory.csv > pretest_00001.csv
So using a list, I put it all in a Python script like:
import subprocess
mylist = [00001, 00002, 00003]
def myawk(item_code):
subprocess.call("awk -F "," '/US/ && /%d/ {print $1","$3}' stock_inventory.csv > pretest_%d.csv") % item_code
for i in mylist:
myawk(i)
I did it wrong somewhere. Could Popen have been of any help?
What about lambda in this case?
Thanks for you help.
perhaps ditch python and do all in awk?
$ awk 'BEGIN{FS=OFS=",";
n=split("00001,00002,00003",item)}
/US/ {for(i=1;i<=n;i++)
if($0~item[i])
print $1,$3 > "pretest_"item[i]".csv"}' stock_inventory.csv
Thanks all for your help.
I found a more elegant way to resolve it, though it's not through a Python script but a Bash one instead.
#!/bin/bash
declare -a arr=("00001" "00002" "00003")
for i in "${arr[#]}"
do
cat stock_inventory.csv | grep $i | grep US | awk -F "," '{print $1","$3}' > pretest_$i.csv
done

How to correctly escape special characters in python subprocess?

Im trying to run this bash command using python subprocess
find /Users/johndoe/sandbox -iname "*.py" | awk -F'/' '{ print $NF}'
output:-
helld.xl.py
parse_maillog.py
replace_pattern.py
split_text_match.py
ssh_bad_login.py
Here is what i have done in python2.7 way, but it gives the output where awk command filter is not working
>>> p1=subprocess.Popen(["find","/Users/johndoe/sandbox","-iname","*.py"],stdout=subprocess.PIPE)
>>> p2=subprocess.Popen(['awk','-F"/"','" {print $NF} "'],stdin=p1.stdout,stdout=subprocess.PIPE)
>>>p2.communicate()
('/Users/johndoe/sandbox/argparse.py\n/Users/johndoe/sandbox/custom_logic_substitute.py\n/Users/johndoe/sandbox/finditer_html_parse.py\n/Users/johndoe/sandbox/finditer_simple.py\n/Users/johndoe/sandbox/group_regex.py\n/Users/johndoe/sandbox/helo.py\n/Users/johndoe/sandbox/newdir/helld.xl.py\n/Users/johndoe/sandbox/parse_maillog.py\n/Users/johndoe/sandbox/replace_pattern.py\n/Users/johndoe/sandbox/split_text_match.py\n/Users/johndoe/sandbox/ssh_bad_login.py\n', None)
I could also get output by using p1 alone here like below,but i cant get the awk working here
list1=[]
result=p1.communicate()[0].split("\n")
for item in res:
a=item.rstrip('/').split('/')
list1.append(a[-1])
print list1
You are incorrectly passing in shell quoting (and extra shell quoting which isn't even required by the shell!) when you're not invoking a shell. Don't do that.
p2=subprocess.Popen(['awk', '-F/', '{print $NF}'], stdin=...
When you have shell=True you need extra quotes around some arguments to protect them from the shell, but there is no shell here, so putting them in is incorrect, and will cause parse errors by Awk.
However, you should almost never need to call Awk from Python, especially for trivial tasks which Python can easily do natively:
list1 = [line.split('/')[-1]
for line in subprocess.check_output(
["find", "/Users/johndoe/sandbox",
"-iname", "*.py"]).splitlines()]
In this particular case, note also that GNU find already has a facility to produce this result directly:
list1 = subprocess.check_output(
["find", "/Users/johndoe/sandbox",
"-iname", "*.py", "-printf", "%f\\n"]).splitlines()
Use this: p2.communicate()[0].split("\n").
It will output a list of lines.
if you don't have any reservation using shell=True , then this should be pretty simple solution
from subprocess import Popen
import subprocess
command='''
find /Users/johndoe/sandbox -iname "*.py" | awk -F'/' '{ print $NF}'
'''
process=Popen(command,shell=True,stdout=subprocess.PIPE)
result=process.communicate()
print result

Grep stops suddenly

I am trying to parse a file in Python, using grep but it always stops at the same line and I am enable to understand why. I tried three different ways :
process = os.popen("grep -A1 "+name+" "+new_hairpins+" | grep -oE '.{0,6}"+seq+".{0,6}'")
results = process.readlines()
process.close()
then
process = subprocess.Popen("grep -A1 "+name+" "+new_hairpins+" | grep -oE '.{0,6}"+seq+".{0,6}'",stdout=PIPE, stderr=PIPE, shell=True)
process.wait()
process_result = process.communicate()
results = filter(None, process_result[0].split("\n"))
and through a temp file
os.system("grep -A1 "+name+" "+new_hairpins+" | grep -oE '.{0,6}"+seq+".{0,6}' > tmp.txt")
with open("tmp.txt","r") as f :
results = f.readlines()
but the script always fails at the same line.
I manually tried this line directly in bash, and it worked....!
So could it be a memory issue from grep and how could I fix this problem ?
Thanks a lot !
You need to quote command line argument because there's a space in between:
"grep -A1 '"+name+" "+new_hairpins+"' | grep ....
^ ^
Otherwise, name, new_hairpins will be treated as separated arguments.
I finally found the problem : my fault !
I realized that the file new_hairpins I used in the grep and that I generated just before in the code wasn't closed with .close()....
As it was working on the 1879 first lines, I didn't think the problem could come from this file...
Anyway, thanks for your help guys!

Categories

Resources