Formatting a command in python subprocess popen - python

I am trying to format the following awk command
awk -v OFS="\t" '{printf "chr%s\t%s\t%s\n", $1, $2-1, $2}' file1.txt > file2.txt
for use in python subprocess popen. However i am having a hard time formatting it. I have tried solutions suggested in similar answers but none of them worked. I have also tried using raw string literals. Also i would not like to use shell=True as this is not recommended
Edit according to comment:
The command i tried was
awk_command = """awk -v OFS="\t" '{printf "chr%s\t%s\t%s\n", $1, $2-1, $2}' file1.txt > file2.txt"""
command_execute = Popen(shlex.split(awk_command))
However i get the following error upon executing this
KeyError: 'printf "chr%s\t%s\t%s\n", $1, $2-1, $2'
googling the error suggests this happens when a value is requested for an undefined key but i do not understand its context here

> is the shell redirection operator. To implement it in Python, use stdout parameter:
#!/usr/bin/env python
import shlex
import subprocess
cmd = r"""awk -v OFS="\t" '{printf "chr%s\t%s\t%s\n", $1, $2-1, $2}'"""
with open('file2.txt', 'wb', 0) as output_file:
subprocess.check_call(shlex.split(cmd) + ["file1.txt"], stdout=output_file)
To avoid starting a separate process, you could implement this particular awk command in pure Python.

The simplest method, especially if you wish to keep the output redirection stuff, is to use subprocess with shell=True - then you only need to escape Python special characters. The line, as a whole, will be interpreted by the default shell.
WARNING: do not use this with untrusted input without sanitizing it first!
Alternatively, you can replace the command line with an argv-type sequence and feed that to subprocess instead. Then, you need to provide stuff as the program would see it:
remove all the shell-level escaping
remove the output redirection stuff and do the redirection yourself instead
Regarding the specific problems:
you didn't escape Python special characters in the string so \t and \n became the literal tab and newline (try to print awk_command)
using shlex.split is nothing different from shell=True - with an added unreliability since it cannot guarantee if would parse the string the same way your shell would in every case (not to mention the lack of transmutations the shell makes).
Specifically, it doesn't know or care about the special meaning of the redirection part:
>>> awk_command = """awk -v OFS="\\t" '{printf "chr%s\\t%s\\t%s\\n", $1, $2- 1, $2}' file1.txt > file2.txt"""
>>> shlex.split(awk_command)
['awk','-v','OFS=\\t','{printf "chr%s\\t%s\\t%s\\n", $1, $2-1, $2}','file1.txt','>','file2.txt']
So, if you wish to use shell=False, do construct the argument list yourself.

Related

subprocess.call with command having embedded spaces and quotes

I would like to retrieve output from a shell command that contains spaces and quotes. It looks like this:
import subprocess
cmd = "docker logs nc1 2>&1 |grep mortality| awk '{print $1}'|sort|uniq"
subprocess.check_output(cmd)
This fails with "No such file or directory". What is the best/easiest way to pass commands such as these to subprocess?
The absolutely best solution here is to refactor the code to replace the entire tail of the pipeline with native Python code.
import subprocess
from collections import Counter
s = subprocess.run(
["docker", "logs", "nc1"],
text=True, capture_output=True, check=True)
count = Counter()
for line in s.stdout.splitlines():
if "mortality" in line:
count[line.split()[0]] += 1
for count, word in count.most_common():
print(count, word)
There are minor differences in how Counter objects resolve ties (if two words have the same count, the one which was seen first is returned first, rather than by sort order), but I'm guessing that's unimportant here.
I am also ignoring standard output from the subprocess; if you genuinely want to include output from error messages, too, just include s.stderr in the loop driver too.
However, my hunch is that you don't realize your code was doing that, which drives home the point nicely: Mixing shell script and Python raises the mainainability burden, because now you have to understand both shell script and Python to understand the code.
(And in terms of shell script style, I would definitely get rid of the useless grep by refactoring it into the Awk script, and probably also fold in the sort | uniq which has a trivial and more efficient replacement in Awk. But here, we are replacing all of that with Python code anyway.)
If you really wanted to stick to a pipeline, then you need to add shell=True to use shell features like redirection, pipes, and quoting. Without shell=True, Python looks for a command whose file name is the entire string you were passing in, which of course doesn't exist.

How to correctly escape special characters in python subprocess?

Im trying to run this bash command using python subprocess
find /Users/johndoe/sandbox -iname "*.py" | awk -F'/' '{ print $NF}'
output:-
helld.xl.py
parse_maillog.py
replace_pattern.py
split_text_match.py
ssh_bad_login.py
Here is what i have done in python2.7 way, but it gives the output where awk command filter is not working
>>> p1=subprocess.Popen(["find","/Users/johndoe/sandbox","-iname","*.py"],stdout=subprocess.PIPE)
>>> p2=subprocess.Popen(['awk','-F"/"','" {print $NF} "'],stdin=p1.stdout,stdout=subprocess.PIPE)
>>>p2.communicate()
('/Users/johndoe/sandbox/argparse.py\n/Users/johndoe/sandbox/custom_logic_substitute.py\n/Users/johndoe/sandbox/finditer_html_parse.py\n/Users/johndoe/sandbox/finditer_simple.py\n/Users/johndoe/sandbox/group_regex.py\n/Users/johndoe/sandbox/helo.py\n/Users/johndoe/sandbox/newdir/helld.xl.py\n/Users/johndoe/sandbox/parse_maillog.py\n/Users/johndoe/sandbox/replace_pattern.py\n/Users/johndoe/sandbox/split_text_match.py\n/Users/johndoe/sandbox/ssh_bad_login.py\n', None)
I could also get output by using p1 alone here like below,but i cant get the awk working here
list1=[]
result=p1.communicate()[0].split("\n")
for item in res:
a=item.rstrip('/').split('/')
list1.append(a[-1])
print list1
You are incorrectly passing in shell quoting (and extra shell quoting which isn't even required by the shell!) when you're not invoking a shell. Don't do that.
p2=subprocess.Popen(['awk', '-F/', '{print $NF}'], stdin=...
When you have shell=True you need extra quotes around some arguments to protect them from the shell, but there is no shell here, so putting them in is incorrect, and will cause parse errors by Awk.
However, you should almost never need to call Awk from Python, especially for trivial tasks which Python can easily do natively:
list1 = [line.split('/')[-1]
for line in subprocess.check_output(
["find", "/Users/johndoe/sandbox",
"-iname", "*.py"]).splitlines()]
In this particular case, note also that GNU find already has a facility to produce this result directly:
list1 = subprocess.check_output(
["find", "/Users/johndoe/sandbox",
"-iname", "*.py", "-printf", "%f\\n"]).splitlines()
Use this: p2.communicate()[0].split("\n").
It will output a list of lines.
if you don't have any reservation using shell=True , then this should be pretty simple solution
from subprocess import Popen
import subprocess
command='''
find /Users/johndoe/sandbox -iname "*.py" | awk -F'/' '{ print $NF}'
'''
process=Popen(command,shell=True,stdout=subprocess.PIPE)
result=process.communicate()
print result

Executing awk command from python

I am trying to execute the following awk command from a python script
awk 'BEGIN {FS="\t"}; {print $1"\t"$2}' file_a > file_b
For this, I tried to use subprocess as follows:
subprocess.check_output(["awk", 'BEGIN {FS="\t"}; {print $1"\t"$2}',
file_a, ">",
file_b])
where file_a and file_b are strings pointing to the path of the files.
From this, I am getting the error
awk: cannot open > (No such file or directory)
I'm sure I'm inputing the arguments to subprocess in a wrong way, but I can't figure out what's wrong.
While it may look like it in your shell of choice, >, <, and | are not actually passed as arguments to the program you run. Rather, they're a special part of the shell that the program never gets to see.
Since they're part of the shell, and not part of the OS or program, you have to emulate their effects yourself with the normal facilities the language gives you. In your case, since you're trying to pipe to a file, simply use Python's open() as you would normally. The subprocess API supports arguments to specify stdout, stdin, and stderr, and you can supply any file object for those.
Check it out:
with open(file_b, 'wb') as f:
subprocess.call(["awk", 'BEGIN {FS="\t"}; {print $1"\t"$2}', file_a], stdout=f)
Since subprocess.check_output redirects output already, it doesn't take the stdout argument. Using subprocess.call avoids this. If you also need the output later in the script, you can instead assign the return value of check_output to a variable, and then save that to file_b.
If you use a lot of shell commands, you might also want to check out Plumbum, which gives you a large set of fairly silly shell-like operator overloads.

python variables in awk

I like python and I like awk too, and I know that can use it via subprocess or command library, BUT I want to use awk with variables defined before in python, like this simple example:
file = 'file_i_want_read.list'
awk '{print $0}' file > another_file
anybody know how can I do it or something similar?
The easy way to do this is to not use the shell, and instead just pass a list of arguments to subprocess, so file is just one of those arguments.
The only trick is that if you don't use the shell, you can't use shell features like redirection; you have to use the equivalent subprocess features. Like this:
with open('another_file', 'wb') as output:
subprocess.check_call(['awk', '{print $0}', file], stdout=output)
If you really want to use shell redirection instead, then you have to build a shell command line. That's mainly just a matter of using your favorite Python string manipulation methods. But you need to be careful to make sure to quote and/or escape thingsā€”e.g., if file might be file i want read.list, then that will show up as 4 separate arguments unless you put it in quotes. shlex.quote can do that for you. So:
cmdline = "awk '{print $0}' %s > another_file" % (shlex.quote(file),)
subprocess.check_call(cmdline, shell=True)

Python equivalent to perl -pe?

I need to pick some numbers out of some text files. I can pick out the lines I need with grep, but didn't know how to extract the numbers from the lines. A colleague showed me how to do this from bash with perl:
cat results.txt | perl -pe 's/.+(\d\.\d+)\.\n/\1 /'
However, I usually code in Python, not Perl. So my question is, could I have used Python in the same way? I.e., could I have piped something from bash to Python and then gotten the result straight to stdout? ... if that makes sense. Or is Perl just more convenient in this case?
Yes, you can use Python from the command line. python -c <stuff> will run <stuff> as Python code. Example:
python -c "import sys; print sys.path"
There isn't a direct equivalent to the -p option for Perl (the automatic input/output line-by-line processing), but that's mostly because Python doesn't use the same concept of $_ and whatnot that Perl does - in Python, all input and output is done manually (via raw_input()/input(), and print/print()).
For your particular example:
cat results.txt | python -c "import re, sys; print ''.join(re.sub(r'.+(\d\.\d+)\.\n', r'\1 ', line) for line in sys.stdin)"
(Obviously somewhat more unwieldy. It's probably better to just write the script to do it in actual Python.)
You can use:
$ python -c '<your code here>'
You can in theory, but Python doesn't have anywhere near as much regex magic that Perl does, so the resulting command will be much more unwieldy, especially as you can't use regular expressions without importing re (and you'll probably need sys for sys.stdin too).
The Python equivalent of your colleague's Perl one-liner is approximately:
import sys, re
for line in sys.stdin:
print re.sub(r'.+(\d\.\d+)\.\n', r'\1 ', line)
You have a problem which can be solved several ways.
I think you should consider using regular expression (what perl is doing in your example) directly from Python. Regular expressions are in the re module. An example would be:
import re
filecontent = open('somefile.txt').read()
print re.findall('.+(\d\.\d+)\.$', filecontent)
(I would prefer using $ instead of '\n' for line endings, because line endings are different between operational systems and file encodings)
If you want to call bash commands from inside Python, you could use:
import os
os.system(mycommand)
Where command is the bash command. I use it all the time, because some operations are better to perform in bash than in Python.
Finally, if you want to extract the numbers with grep, use the -o option, which prints only the matched part.
Perl (or sed) is more convenient. However it is possible, if ugly:
python -c 'import sys, re; print "\n".join(re.sub(".+(\d\.\d+)\.\n","\1 ", l) for l in sys.stdin)'
Quoting from https://stackoverflow.com/a/12259852/411282:
for ln in __import__("fileinput").input(): print ln.rstrip()
See the explanation linked above, but this does much more of what perl -p does, including support for multiple file names and stdin when no filename is given.
https://docs.python.org/3/library/fileinput.html#fileinput.input
You can use python to execute code directly from your bash command line, by using python -c, or you can process input piped to stdin using sys.stdin, see here.

Categories

Resources