How do i make my awk command into a python command - python

I have an awk command that works in bash, but im now trying to put it into a python script
I have tried both os.system, and subprocess.call both return the same error. sh: 1: Syntax error: "(" unexpected
os.system('awk \'FNR<=27{print;next} ++count%10==0{print;count}\' \'{0} > {1}\'.format(inputfile, outpufile)')
So this awk command will take the large inputfile and create an output file that leaves the first 27 lines of header, but then starting on line 28 it only takes every 10th line and puts it into the output file
Im using the .format() because it is within a python script where the input file will be different every times its run.
ive also tried
subprocess.call('awk \'FNR<=27{print;next} ++count%10==0{print;count}\' \'{0} > {1}\'.format(inputfile, outpufile)')
both come up with the same error above. What am I missing?

As per the comment above, probably more pythonic (and more manageable) to directly use python.
But, if you want to use awk then one way is to format your command with your variable filenames separately.
This works using a basic test text file:
import os
def awk_runner(inputfile, outputfile):
cmd = "awk 'FNR<=27{print;next} ++count%10==0{print;count}' " + inputfile + " > " + outputfile
os.system(cmd)
awk_runner('test1.txt', 'testout1.txt')

There are two main issues with your Python code:
format() is a python method call, it should not be put into the string of awk_cmd to execute under the shell
when calling format() method, braces {} are used to identify substitution target in the format string objects, they need to be escaped using {{ ... }}
See below a modified version of your code:
awk_cmd = "awk 'FNR<=7{{print;next}} ++count%10==0{{print;count}}' {0} > {1}".format(inputfile, outpufile)
os.system(awk_cmd)

Related

Python: how do I read (not run) a shell script, inserting arguments along the way

I have a simple shell script script.sh:
echo "ubuntu:$1" | sudo chpasswd
I need to open the script, read it, insert the argument, and save it as a string like so: 'echo "ubuntu:arg_passed_when_opening" | sudo chpasswd' using Python.
All the options suggested here actually execute the script, which is not what I want.
Any suggestions?
You would do this the same way that you read any text file, and we can use sys.argv to get the argument passed when running the python script.
Ex:
import sys
with open('script.sh', 'r') as sfile:
modified_file_contents = sfile.read().replace('$1', sys.argv[1])
With this method, modified_file_contents is a string containing the text of the file, but with the specified variable replaced with the argument passed to the python script when it was run.

Issues calling awk from within Python using subprocess.call

Having some issues calling awk from within Python. Normally, I'd do the following to call the command in awk from the command line.
Open up command line, in admin mode or not.
Change my directory to awk.exe, namely cd R\GnuWin32\bin
Call awk -F "," "{ print > (\"split-\" $10 \".csv\") }" large.csv
My command is used to split up the large.csv file based on the 10th column into a number of files named split-[COL VAL HERE].csv. I have no issues running this command. I tried to run the same code in Python using subprocess.call() but I'm having some issues. I run the following code:
def split_ByInputColumn():
subprocess.call(['C:/R/GnuWin32/bin/awk.exe', '-F', '\",\"',
'\"{ print > (\\"split-\\" $10 \\".csv\\") }\"', 'large.csv'],
cwd = 'C:/R/GnuWin32/bin/')
and clearly, something is running when I execute the function (CPU usage, etc) but when I go to check C:/R/GnuWin32/bin/ there are no split files in the directory. Any idea on what's going wrong?
As I stated in my previous answer that was downvoted, you overprotect the arguments, making awk argument parsing fail.
Since there was no comment, I supposed there was a typo but it worked... So I suppose that's because I should have strongly suggested a full-fledged python solution, which is the best thing to do here (as stated in my previous answer)
Writing the equivalent in python is not trivial as we have to emulate the way awk opens files and appends to them afterwards. But it is more integrated, pythonic and handles quoting properly if quoting occurs in the input file.
I took the time to code & test it:
def split_ByInputColumn():
# get rid of the old data from previous runs
for f in glob.glob("split-*.csv"):
os.remove(f)
open_files = dict()
with open('large.csv') as f:
cr = csv.reader(f,delimiter=',')
for r in cr:
tenth_row = r[9]
filename = "split-{}.csv".format(tenth_row)
if not filename in open_files:
handle = open(filename,"wb")
open_files[filename] = (handle,csv.writer(handle,delimiter=','))
open_files[filename][1].writerow(r)
for f,_ in open_files.values():
f.close()
split_ByInputColumn()
in detail:
read the big file as csv (advantage: quoting is handled properly)
compute the destination filename
if filename not in dictionary, open it and create csv.writer object
write the row in the corresponding dictionary
in the end, close file handles
Aside: My old solution, using awk properly:
import subprocess
def split_ByInputColumn():
subprocess.call(['awk.exe', '-F', ',',
'{ print > ("split-" $10 ".csv") }', 'large.csv'],cwd = 'some_directory')
Someone else posted an answer (and then subsequently deleted it), but the issue was that I was over-protecting my arguments. The following code works:
def split_ByInputColumn():
subprocess.call(['C:/R/GnuWin32/bin/awk.exe', '-F', ',',
'{ print > (\"split-\" $10 \".csv\") }', 'large.csv'],
cwd = 'C:/R/GnuWin32/bin/')

How to use awk if statement and for loop in subprocess.call

Trying to print filename of files that don't have 12 columns.
This works at the command line:
for i in *dim*; do awk -F',' '{if (NR==1 && NF!=12)print FILENAME}' $i; done;
When I try to embed this in subprocess.call in a python script, it doesn't work:
subprocess.call("""for %i in (*dim*.csv) do (awk -F, '{if ("NR==1 && NF!=12"^) {print FILENAME}}' %i)""", shell=True)
The first error I received was "Print is unexpected at this time" so I googled and added ^ within parentheses. Next error was "unexpected newline or end of string" so googled again and added the quotes around NR==1 && NF!=12. With the current code it's printing many lines in each file so I suspect something is wrong with the if statement. I've used awk and for looped before in this style in subprocess.call but not combined and with an if statement.
Multiple input files in AWK
In the string you are passing to subprocess.call(), your if statement is evaluating a string (probably not the comparison you want). It might be easier to just simplify the shell command by doing everything in AWK. You are executing AWK for every $i in the shell's for loop. Since you can give multiple input files to AWK, there is really no need for this loop.
You might want to scan through the entire files until you find any line that has other than 12 fields, and not only check the first line (NR==1). In this case, the condition would be only NF!=12.
If you want to check only the first line of each file, then NR==1 becomes FNR==1 when using multiple files. NR is the "number of records" (across all input files) and FNR is "file number of records" for the current input file only. These are special built-in variables in AWK.
Also, the syntax of AWK allows for the blocks to be executed only if the line matches some condition. Giving no condition (as you did) runs the block for every line. For example, to scan through all files given to AWK and print the name of a file with other than 12 fields on the first line, try:
awk -F, 'FNR==1 && NF!=12{print FILENAME; nextfile}' *dim*.csv
I have added the .csv to your wildcard *dim* as you had in the Python version. The -F, of course changes the field separator to a comma from the default space. For every line in each file, AWK checks if the number of fields NF is 12, if it's not, it executes the block of code, otherwise it goes on to the next line. This block prints the FILENAME of the current file AWK is processing, then skips to the beginning of the next file with nextfile.
Try running this AWK version with your subprocess module in Python:
subprocess.call("""awk -F, 'FNR==1 && NF!=12{print FILENAME; nextfile}' *dim*.csv""", shell=True)
The triple quotes makes it a literal string. The output of AWK goes to stdout and I'm assuming you know how to use this in Python with the subprocess module.
Using only Python
Don't forget that Python is itself an expressive and powerful language. If you are already using Python, it may be simpler, easier, and more portable to use only Python instead of a mixture of Python, bash, and AWK.
You can find the names of files (selected from *dim*.csv) with the first line of each file having other than 12 comma-separated fields with:
import glob
files_found = []
for filename in glob.glob('*dim*.csv'):
with open(filename, 'r') as f:
firstline = f.readline()
if len(firstline.split(',')) != 12:
files_found.append(filename)
f.close()
print(files_found)
The glob module gives the listing of files matching the wildcard pattern *dim*.csv. The first line of each of these files is read and split into fields separated by commas. If the number of these fields is not 12, it is added to the list files_found.

Taking the results of a bash command and using it in python

I am trying to write a code in python that will take some information from top and put it into a file.
I want to just write the name of the application and generate the file. The problem i am having is that i can't get the output of the pidof command so i can use it in python. My code looks like this :
import os
a = input('Name of the application')
val=os.system('pidof ' + str(a))
os.system('top -d 30 | grep' + str(val) + '> test.txt')
os.system('awk '{print $10, $11}' test.txt > test2.txt')
The problem is that val always has 0 but the command is returning the pid i want. Any input would be great.
First up, the use of input() is discouraged as it expects the user to type in valid Python expressions. Use raw_input() instead:
app = raw_input('Name of the application: ')
Next up, the return value from system('pidof') isn't the PID, it's the exit code from the pidof command, i.e. zero on success, non-zero on failure. You want to capture the output of pidof.
import subprocess
# Python 2.7 only
pid = int(subprocess.check_output(['pidof', app]))
# Python 2.4+
pid = int(subprocess.Popen(['pidof', app], stdout=subprocess.PIPE).communicate()[0])
# Older (deprecated)
pid = int(os.popen('pidof ' + app).read())
The next line is missing a space after the grep and would have resulted in a command like grep1234. Using the string formatting operator % will make this a little easier to spot:
os.system('top -d 30 | grep %d > test.txt' % (pid))
The third line is badly quoted and should have caused a syntax error. Watch out for the single quotes inside of single quotes.
os.system("awk '{print $10, $11}' test.txt > test2.txt")
Instead of os.system, I recommend you to use the subprocess module: http://docs.python.org/library/subprocess.html#module-subprocess
With that module, you can communicate (input and output) with a shell. The documentation explains the details of how to use it.
Hope this helps!

Python subprocess to call Unix commands, a question about how output is stored

I am writing a python script that reads a line/string, calls Unix, uses grep to search a query file for lines that contain the string, and then prints the results.
from subprocess import call
for line in infilelines:
output = call(["grep", line, "path/to/query/file"])
print output
print line`
When I look at my results printed to the screen, I will get a list of matching strings from the query file, but I will also get "1" and "0" integers as output, and line is never printed to the screen. I expect to get the lines from the query file that match my string, followed by the string that I used in my search.
call returns the process return code.
If using Python 2.7, use check_output.
from subprocess import check_output
output = check_output(["grep", line, "path/to/query/file"])
If using anything before that, use communicate.
import subprocess
process = subprocess.Popen(["grep", line, "path/to/query/file"], stdout=subprocess.PIPE)
output = process.communicate()[0]
This will open a pipe for stdout that you can read with communicate. If you want stderr too, you need to add "stderr=subprocess.PIPE" too.
This will return the full output. If you want to parse it into separate lines, use split.
output.split('\n')
I believe Python takes care of line-ending conversions for you, but since you're using grep I'm going to assume you're on Unix where the line-ending is \n anyway.
http://docs.python.org/library/subprocess.html#subprocess.check_output
The following code works with Python >= 2.5:
from commands import getoutput
output = getoutput('grep %s path/to/query/file' % line)
output_list = output.splitlines()
Why would you want to execute a call to external grep when Python itself can do it? This is extra overhead and your code will then be dependent on grep being installed. This is how you do simple grep in Python with "in" operator.
query=open("/path/to/query/file").readlines()
query=[ i.rstrip() for i in query ]
f=open("file")
for line in f:
if "line" in query:
print line.rstrip()
f.close()

Categories

Resources