python variables in awk - python

I like python and I like awk too, and I know that can use it via subprocess or command library, BUT I want to use awk with variables defined before in python, like this simple example:
file = 'file_i_want_read.list'
awk '{print $0}' file > another_file
anybody know how can I do it or something similar?

The easy way to do this is to not use the shell, and instead just pass a list of arguments to subprocess, so file is just one of those arguments.
The only trick is that if you don't use the shell, you can't use shell features like redirection; you have to use the equivalent subprocess features. Like this:
with open('another_file', 'wb') as output:
subprocess.check_call(['awk', '{print $0}', file], stdout=output)
If you really want to use shell redirection instead, then you have to build a shell command line. That's mainly just a matter of using your favorite Python string manipulation methods. But you need to be careful to make sure to quote and/or escape thingsā€”e.g., if file might be file i want read.list, then that will show up as 4 separate arguments unless you put it in quotes. shlex.quote can do that for you. So:
cmdline = "awk '{print $0}' %s > another_file" % (shlex.quote(file),)
subprocess.check_call(cmdline, shell=True)

Related

subprocess.call with command having embedded spaces and quotes

I would like to retrieve output from a shell command that contains spaces and quotes. It looks like this:
import subprocess
cmd = "docker logs nc1 2>&1 |grep mortality| awk '{print $1}'|sort|uniq"
subprocess.check_output(cmd)
This fails with "No such file or directory". What is the best/easiest way to pass commands such as these to subprocess?
The absolutely best solution here is to refactor the code to replace the entire tail of the pipeline with native Python code.
import subprocess
from collections import Counter
s = subprocess.run(
["docker", "logs", "nc1"],
text=True, capture_output=True, check=True)
count = Counter()
for line in s.stdout.splitlines():
if "mortality" in line:
count[line.split()[0]] += 1
for count, word in count.most_common():
print(count, word)
There are minor differences in how Counter objects resolve ties (if two words have the same count, the one which was seen first is returned first, rather than by sort order), but I'm guessing that's unimportant here.
I am also ignoring standard output from the subprocess; if you genuinely want to include output from error messages, too, just include s.stderr in the loop driver too.
However, my hunch is that you don't realize your code was doing that, which drives home the point nicely: Mixing shell script and Python raises the mainainability burden, because now you have to understand both shell script and Python to understand the code.
(And in terms of shell script style, I would definitely get rid of the useless grep by refactoring it into the Awk script, and probably also fold in the sort | uniq which has a trivial and more efficient replacement in Awk. But here, we are replacing all of that with Python code anyway.)
If you really wanted to stick to a pipeline, then you need to add shell=True to use shell features like redirection, pipes, and quoting. Without shell=True, Python looks for a command whose file name is the entire string you were passing in, which of course doesn't exist.

How to correctly escape special characters in python subprocess?

Im trying to run this bash command using python subprocess
find /Users/johndoe/sandbox -iname "*.py" | awk -F'/' '{ print $NF}'
output:-
helld.xl.py
parse_maillog.py
replace_pattern.py
split_text_match.py
ssh_bad_login.py
Here is what i have done in python2.7 way, but it gives the output where awk command filter is not working
>>> p1=subprocess.Popen(["find","/Users/johndoe/sandbox","-iname","*.py"],stdout=subprocess.PIPE)
>>> p2=subprocess.Popen(['awk','-F"/"','" {print $NF} "'],stdin=p1.stdout,stdout=subprocess.PIPE)
>>>p2.communicate()
('/Users/johndoe/sandbox/argparse.py\n/Users/johndoe/sandbox/custom_logic_substitute.py\n/Users/johndoe/sandbox/finditer_html_parse.py\n/Users/johndoe/sandbox/finditer_simple.py\n/Users/johndoe/sandbox/group_regex.py\n/Users/johndoe/sandbox/helo.py\n/Users/johndoe/sandbox/newdir/helld.xl.py\n/Users/johndoe/sandbox/parse_maillog.py\n/Users/johndoe/sandbox/replace_pattern.py\n/Users/johndoe/sandbox/split_text_match.py\n/Users/johndoe/sandbox/ssh_bad_login.py\n', None)
I could also get output by using p1 alone here like below,but i cant get the awk working here
list1=[]
result=p1.communicate()[0].split("\n")
for item in res:
a=item.rstrip('/').split('/')
list1.append(a[-1])
print list1
You are incorrectly passing in shell quoting (and extra shell quoting which isn't even required by the shell!) when you're not invoking a shell. Don't do that.
p2=subprocess.Popen(['awk', '-F/', '{print $NF}'], stdin=...
When you have shell=True you need extra quotes around some arguments to protect them from the shell, but there is no shell here, so putting them in is incorrect, and will cause parse errors by Awk.
However, you should almost never need to call Awk from Python, especially for trivial tasks which Python can easily do natively:
list1 = [line.split('/')[-1]
for line in subprocess.check_output(
["find", "/Users/johndoe/sandbox",
"-iname", "*.py"]).splitlines()]
In this particular case, note also that GNU find already has a facility to produce this result directly:
list1 = subprocess.check_output(
["find", "/Users/johndoe/sandbox",
"-iname", "*.py", "-printf", "%f\\n"]).splitlines()
Use this: p2.communicate()[0].split("\n").
It will output a list of lines.
if you don't have any reservation using shell=True , then this should be pretty simple solution
from subprocess import Popen
import subprocess
command='''
find /Users/johndoe/sandbox -iname "*.py" | awk -F'/' '{ print $NF}'
'''
process=Popen(command,shell=True,stdout=subprocess.PIPE)
result=process.communicate()
print result

Executing awk command from python

I am trying to execute the following awk command from a python script
awk 'BEGIN {FS="\t"}; {print $1"\t"$2}' file_a > file_b
For this, I tried to use subprocess as follows:
subprocess.check_output(["awk", 'BEGIN {FS="\t"}; {print $1"\t"$2}',
file_a, ">",
file_b])
where file_a and file_b are strings pointing to the path of the files.
From this, I am getting the error
awk: cannot open > (No such file or directory)
I'm sure I'm inputing the arguments to subprocess in a wrong way, but I can't figure out what's wrong.
While it may look like it in your shell of choice, >, <, and | are not actually passed as arguments to the program you run. Rather, they're a special part of the shell that the program never gets to see.
Since they're part of the shell, and not part of the OS or program, you have to emulate their effects yourself with the normal facilities the language gives you. In your case, since you're trying to pipe to a file, simply use Python's open() as you would normally. The subprocess API supports arguments to specify stdout, stdin, and stderr, and you can supply any file object for those.
Check it out:
with open(file_b, 'wb') as f:
subprocess.call(["awk", 'BEGIN {FS="\t"}; {print $1"\t"$2}', file_a], stdout=f)
Since subprocess.check_output redirects output already, it doesn't take the stdout argument. Using subprocess.call avoids this. If you also need the output later in the script, you can instead assign the return value of check_output to a variable, and then save that to file_b.
If you use a lot of shell commands, you might also want to check out Plumbum, which gives you a large set of fairly silly shell-like operator overloads.

Converting a file from .sam to .bam using python subprocess

I would like to start out by saying any help is greatly appreciated. I'm new to Python and scripting in general. I am trying to use a program called samtools view to convert a file from .sam to a .bam I need to be able do what this BASH command is doing in Python:
samtools view -bS aln.sam > aln.bam
I understand that BASH commands like | > < are done using the subprocess stdin, stdout and stderr in Python. I have tried a few different methods and still can't get my BASH script converted correctly. I have tried:
cmd = subprocess.call(["samtools view","-bS"], stdin=open(aln.sam,'r'), stdout=open(aln.bam,'w'), shell=True)
and
from subprocess import Popen
with open(SAMPLE+ "."+ TARGET+ ".sam",'wb',0) as input_file:
with open(SAMPLE+ "."+ TARGET+ ".bam",'wb',0) as output_file:
cmd = Popen([Dir+ "samtools-1.1/samtools view",'-bS'],
stdin=(input_file), stdout=(output_file), shell=True)
in Python and am still not getting samtools to convert a .sam to a .bam file. What am I doing wrong?
Abukamel is right, but in case you (or others) are wondering about your specific examples....
You're not too far off with your first attempt, just a few minor items:
Filenames should be in quotes
samtools reads from a named input file, not from stdin
You don't need "shell=True" since you're not using shell tricks like redirection
So you can do:
import subprocess
subprocess.call(["samtools", "view", "-bS", "aln.sam"],
stdout=open('aln.bam','w'))
Your second example has more or less the same issues, so would need to be changed to something like:
from subprocess import Popen
with open('aln.bam', 'wb',0) as output_file:
cmd = Popen(["samtools", "view",'-bS','aln.sam'],
stdout=(output_file))
You can pass execution to the shell by kwarg 'shell=True'
subprocess.call('samtools view -bS aln.sam > aln.bam', shell=True)

Formatting a command in python subprocess popen

I am trying to format the following awk command
awk -v OFS="\t" '{printf "chr%s\t%s\t%s\n", $1, $2-1, $2}' file1.txt > file2.txt
for use in python subprocess popen. However i am having a hard time formatting it. I have tried solutions suggested in similar answers but none of them worked. I have also tried using raw string literals. Also i would not like to use shell=True as this is not recommended
Edit according to comment:
The command i tried was
awk_command = """awk -v OFS="\t" '{printf "chr%s\t%s\t%s\n", $1, $2-1, $2}' file1.txt > file2.txt"""
command_execute = Popen(shlex.split(awk_command))
However i get the following error upon executing this
KeyError: 'printf "chr%s\t%s\t%s\n", $1, $2-1, $2'
googling the error suggests this happens when a value is requested for an undefined key but i do not understand its context here
> is the shell redirection operator. To implement it in Python, use stdout parameter:
#!/usr/bin/env python
import shlex
import subprocess
cmd = r"""awk -v OFS="\t" '{printf "chr%s\t%s\t%s\n", $1, $2-1, $2}'"""
with open('file2.txt', 'wb', 0) as output_file:
subprocess.check_call(shlex.split(cmd) + ["file1.txt"], stdout=output_file)
To avoid starting a separate process, you could implement this particular awk command in pure Python.
The simplest method, especially if you wish to keep the output redirection stuff, is to use subprocess with shell=True - then you only need to escape Python special characters. The line, as a whole, will be interpreted by the default shell.
WARNING: do not use this with untrusted input without sanitizing it first!
Alternatively, you can replace the command line with an argv-type sequence and feed that to subprocess instead. Then, you need to provide stuff as the program would see it:
remove all the shell-level escaping
remove the output redirection stuff and do the redirection yourself instead
Regarding the specific problems:
you didn't escape Python special characters in the string so \t and \n became the literal tab and newline (try to print awk_command)
using shlex.split is nothing different from shell=True - with an added unreliability since it cannot guarantee if would parse the string the same way your shell would in every case (not to mention the lack of transmutations the shell makes).
Specifically, it doesn't know or care about the special meaning of the redirection part:
>>> awk_command = """awk -v OFS="\\t" '{printf "chr%s\\t%s\\t%s\\n", $1, $2- 1, $2}' file1.txt > file2.txt"""
>>> shlex.split(awk_command)
['awk','-v','OFS=\\t','{printf "chr%s\\t%s\\t%s\\n", $1, $2-1, $2}','file1.txt','>','file2.txt']
So, if you wish to use shell=False, do construct the argument list yourself.

Categories

Resources