Why cannot pass arguments with subprocess.PIPE in python? - python

I'm trying to do something really easy in order to learn how to use subprocess in python
What I'm trying is this:
ll | egrep "*gz"
so after read the manual of python (which I didn't understand very well), I tried this:
lista = subprocess.Popen(['ls', '-alF'], stdout=subprocess.PIPE)
filtro = subprocess.Popen(['egrep', '"*gz"'], stdin=lista.stdout, stdout=subprocess.PIPE)
filtro.communicate()[0]
But all I get is '' and I don't really know how to do this, I've read this but it seems I didn't get it at all... could somebody explain to me how this works in order to use it after with other commands??
Thanks in advance!!

The problem might be the double set of quotes around the argument to egrep. Try this instead:
import subprocess
ls = subprocess.Popen(['ls', '-alF'], stdout=subprocess.PIPE)
egrep = subprocess.Popen(['egrep', '\.gz$'], stdin=ls.stdout, stdout=subprocess.PIPE)
print egrep.communicate()[0]
I am assuming you are looking for files ending in ".gz" here, as your initial regex does not make sense. If you are simply looking for files ending in "gz", you would use 'gz$' instead. And if you do not case where in the line "gz" appears, simply use 'gz'.
Edit: Here is a full example. In a directory containing the three files "pipe.py", "test1.py.gz" and "test2.py.gz", where "pipe.py" is the above script, I execute:
$ python pipe.py
With the result
-rw-r--r-- 1 amaurea amaurea 146 Jan 30 20:54 test1.py.gz
-rw-r--r-- 1 amaurea amaurea 150 Jan 30 20:54 test2.py.gz

On Unix, take advantage of shell argument:
lista = subprocess.Popen('ls -alF', stdout=subprocess.PIPE, shell=True)
filtro = subprocess.Popen('egrep "*gz"', stdin=lista.stdout, stdout=subprocess.PIPE, shell=True)
filtro.communicate()[0]
You can simply copy commands and do not wary about breaking them into argument lists.
Also you can specify shell explicitly:
subprocess.Popen('bash -c "ls -l"', stdout=subprocess.PIPE, shell=True)

Related

Execute df | grep -w "/" not parsing output correctly

I am trying to run the shell command df -h | grep -w "/" using python to watch the root partition usage and wanted to avoid shell=True option for security.
The code I tried as follows:
import subprocess
p1 = subprocess.Popen(['df', '-h'], stdout=subprocess.PIPE)
p2 = subprocess.Popen(['grep', '-w', '"/"'], stdin=p1.stdout, stdout=subprocess.PIPE)
output=p2.communicate()[0]
print(output)
The output I get is:
$ ./subprocess_df_check.py
b''
Expected output is:
$ df -h | grep -w "/"
/dev/sdd 251G 4.9G 234G 3% /
The immediate problem is the unnecessary quotes being added.
p2 = subprocess.Popen(['grep', '-w', '"/"'], stdin=p1.stdout, stdout=subprocess.PIPE)
is not equivalent to the shell command grep -w "/". Instead, it's equivalent to the shell command grep -w '"/"', (or grep -w \"/\", or any other means of writing an argument vector that passes literal double-quote characters on the last non-NUL element of grep's argument vector) and wrong for the same reasons.
Use '/', not '"/"'.
Don't use subprocess with df and / or grep. If you already use python, you can use the statvfs function call like:
import os
import time
path = "/"
while True:
info = os.statvfs(path)
print("Block size [%d] Free blocks [%d] Free inodes [%d]"
% (info.f_bsize, info.f_bfree, info.f_ffree))
time.sleep(15)
Running grep in a separate subprocess is certainly unnecessary. If you are using Python, you already have an excellent tool for looking for substrings within strings.
df = subprocess.run(['df', '-h'],
capture_output=True, text=True, check=True)
for line in df.stdout.split('\n')[1:]:
if '/' in line:
print(line)
Notice also how you basically always want to prefer subprocess.run over Popen when you can, and how you want text=True to get text rather than bytes. Usually you also want check=True to ensure that the subprocess completed successfully.
Ok figured out the whole thing.
import subprocess
p1 = subprocess.Popen(['df', '-h'], stdout=subprocess.PIPE)
p2 = subprocess.Popen(['grep', '-w', '/'], stdin=p1.stdout, stdout=subprocess.PIPE)
output=p2.communicate()[0].split()[4]
print("Root partition is of", output.decode(), "usage now")
Removed unnecessary double quotes, changed from subprocess.Popen(['grep', '-w', '"/"'] to subprocess.Popen(['grep', '-w', '/']. The double quotes are for the shell, not for df. When you have no shell, you need no shell syntax.
On output=p2.communicate()[0].split()[4], the [0] picks only stdout, not the stderr, which is None if no error. Then split()[4] cuts the 4th column which is disk usage percent value from shell df command.
output.decode(), the decode() is to decode the encoded bytes string format and avoid outputting character b in front of the result. Refer here
So the output of the script is:
$ ./subprocess_df_check.py
Root partition is of 3% usage now

python way to find unique version of software

I've multiple components of a software (let's call it XYZ) installed on my linux (RHEL 6.2) server running python 3.3.
~$ rpm -qa | grep -i xyz
XYZman-5.3.1.9-1.x86_64
XYZconnect-5.3.1.9-1.x86_64
XYZconsole-5.3.1.9-1.x86_64
XYZnode-5.3.1.9-1.x86_64
I'm trying to covert my install/upgrade script from shell to python. For that I need to fetch the version number, but only once. In my python script I've added the below code
>>> cmd = ("rpm -qa | grep -i xyz | awk -F[-] '{print $2}' | sort -u")
>>> sp = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
>>> (out, err) = sp.communicate()
>>> rcode = sp.returncode
>>> print (out.decode('utf-8').replace('\n', '')
>>> 5.3.1.9
I want to use python based commands instead of awk and sort in this. I think we can use split() for awk, but couldn't figure out the proper way for it.
Can we use python sort to get unique value like sort -u in shell.
You can define the delimeter to use in split() method, like this:
>>> cmd = ("rpm -qa | grep -i xyz")
>>> sp = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
>>> (out, err) = sp.communicate()
>>> v = []
>>> for line in out.splitlines():
... if line.split("-")[1] not in v:
... v.append(line.split("-")[1])
...
>>> print v
['5.3.1.9']
It feels to me like you're trying to shoehorn python into a process where it doesn't really belong. 4 of the 5 lines you displayed were to get bash to talk to python. I highly recommend you do the bash stuff on the bash side, and then just pass whatever you need as an argument into python.
Given all that, if you're still looking for "How do I get an iterable that only has unique elements?" The answer is sets. Key functions for sets are simply add, remove (or discard if you're not sure the element is in the set), and update to merge one set into another. Or, if you have a list of non-unique items and want them unique, construct a new set: foo = set(my_non_unique_list)
Be aware that sets are unordered. Since you were talking about sorting, I don't think ordering is your concern, so sets should be just what you need.

How to correctly escape special characters in python subprocess?

Im trying to run this bash command using python subprocess
find /Users/johndoe/sandbox -iname "*.py" | awk -F'/' '{ print $NF}'
output:-
helld.xl.py
parse_maillog.py
replace_pattern.py
split_text_match.py
ssh_bad_login.py
Here is what i have done in python2.7 way, but it gives the output where awk command filter is not working
>>> p1=subprocess.Popen(["find","/Users/johndoe/sandbox","-iname","*.py"],stdout=subprocess.PIPE)
>>> p2=subprocess.Popen(['awk','-F"/"','" {print $NF} "'],stdin=p1.stdout,stdout=subprocess.PIPE)
>>>p2.communicate()
('/Users/johndoe/sandbox/argparse.py\n/Users/johndoe/sandbox/custom_logic_substitute.py\n/Users/johndoe/sandbox/finditer_html_parse.py\n/Users/johndoe/sandbox/finditer_simple.py\n/Users/johndoe/sandbox/group_regex.py\n/Users/johndoe/sandbox/helo.py\n/Users/johndoe/sandbox/newdir/helld.xl.py\n/Users/johndoe/sandbox/parse_maillog.py\n/Users/johndoe/sandbox/replace_pattern.py\n/Users/johndoe/sandbox/split_text_match.py\n/Users/johndoe/sandbox/ssh_bad_login.py\n', None)
I could also get output by using p1 alone here like below,but i cant get the awk working here
list1=[]
result=p1.communicate()[0].split("\n")
for item in res:
a=item.rstrip('/').split('/')
list1.append(a[-1])
print list1
You are incorrectly passing in shell quoting (and extra shell quoting which isn't even required by the shell!) when you're not invoking a shell. Don't do that.
p2=subprocess.Popen(['awk', '-F/', '{print $NF}'], stdin=...
When you have shell=True you need extra quotes around some arguments to protect them from the shell, but there is no shell here, so putting them in is incorrect, and will cause parse errors by Awk.
However, you should almost never need to call Awk from Python, especially for trivial tasks which Python can easily do natively:
list1 = [line.split('/')[-1]
for line in subprocess.check_output(
["find", "/Users/johndoe/sandbox",
"-iname", "*.py"]).splitlines()]
In this particular case, note also that GNU find already has a facility to produce this result directly:
list1 = subprocess.check_output(
["find", "/Users/johndoe/sandbox",
"-iname", "*.py", "-printf", "%f\\n"]).splitlines()
Use this: p2.communicate()[0].split("\n").
It will output a list of lines.
if you don't have any reservation using shell=True , then this should be pretty simple solution
from subprocess import Popen
import subprocess
command='''
find /Users/johndoe/sandbox -iname "*.py" | awk -F'/' '{ print $NF}'
'''
process=Popen(command,shell=True,stdout=subprocess.PIPE)
result=process.communicate()
print result

How can I get my Python script to work using bash?

I am new to this site so hopefully this is the correct location to place this question.
I am trying to write a script using python for Linux, that:
creates a file file.txt
appends the output of the 'lsof' command to file.txt
read each line of the output and append them to an array.
then print each line.
I'm basically just doing this to familiarize myself with using python for bash, I'm new to this area so any help would be great. I'm not sure where to go from here. Also if there is a better way to do this I'm open to that!
#!/usr/bin/env python
import subprocess
touch = "touch file.txt"
subprocess.call(touch, shell=True)
xfile = "file.txt"
connection_count = "lsof -i tcp | grep ESTABLISHED | wc -l"
count = subprocess.call(connection_count, shell=True)
if count > 0:
connection_lines = "lsof -i tcp | grep ESTABLISHED >> file.txt"
subprocess.call(connection_lines, shell=True)
with open(subprocess.call(xfile, shell=True), "r") as ins:
array = []
for line in ins:
array.append(line)
for i in array:
print i
subprocess.call returns the return code for the process that was started ($? in bash). This is almost certainly not what you want -- and explains why this line almost certainly fails:
with open(subprocess.call(xfile, shell=True), "r") as ins:
(you can't open a number).
Likely, you want to be using subprocess.Popen with stdout=subprocess.PIPE. Then you can read the output from the pipe. e.g. to get the count, you probably want something like:
connection_count = "lsof -i tcp | grep ESTABLISHED"
proc = subprocess.POPEN(connection_count, shell=True, stdout=subprocess.PIPE)
# line counting moved to python :-)
count = sum(1 for unused_line in proc.stdout)
(you could also use Popen.communicate here)
Note, excessive use of shell=True is always a bit scary for me... It's much better to chain your pipes together as demonstrated in the documentation.

Problems capturing Python subprocess output on Mac OS X

I'm running Python 3.3 on Mac OS 10.6.8. I am writing a script that runs several subprocesses, and I want to capture the output of each one and record it in a file. I'm having trouble with this.
I first tried the following:
import subprocess
logFile = open("log.txt", 'w')
proc = subprocess.Popen(args, stdout=logFile, stderr=logFile)
proc.wait()
This produced an empty log.txt. After poking around on the internet for a bit, I tried this instead
import subprocess
proc = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, err = proc.communicate()
logFile = open("log.txt", 'w')
logFile.write(output)
This, too, produced an empty log.txt. So instead of writing to the file, I tried to just print the output to the command line:
output, err = proc.communicate()
print(output)
print(err)
That produced this:
b''
b''
The process I'm trying to run is fastq_quality_trimmer. It takes an input file, filters it, and saves the result to a new file. It only writes a few lines to stdout, like so
Minimum Quality Threshold: 20
Minimum Length: 20
Input: 750000 reads.
Output: 750000 reads.
discarded 0 (0%) too-short reads.
If I run it from the command line and redirect the output like this
fastq_quality_trimmer -Q 33 -v -t 50 -l 20 -i in.fq -o in_trimmed.fq > log.txt
the output is successfully written to log.txt.
I thought perhaps that fastq_quality_trimmer was somehow failing to run when I called it with Popen, but my script produces a filtered file that is identical to the one produced when I run fastq_quality_trimmer from the command line. So it's working; I just can't capture the output. To make matters more confusing, I can successfully capture the output of other processes (echo, other Python scripts) using code that is essentially identical to what I've posted.
Any thoughts? Am I missing something blindingly obvious?
You forgot a comma:
["fastq_quality_trimmer", "-Q", "33" "-v", "-t", "50", "-l", "20", "-i", leftInitial, "-o", leftTrimmed]
add it between "33" and "-v".
You are essentially passing in the arguments -Q 33-v instead of -Q 33 -v.
Python will concatenate two adjacent strings if there is only whitespace between them:
>>> "33", "-v"
('33', '-v')
>>> "33" "-v"
'33-v'
Since -v is the verbose switch that is required to make fastq_quality_trimmer produce output at all, it'll remain silent with it missing.
Whenever you encounter problems with calling a subprocess, triple check the command line created. Pre-pending args with ['echo'] can help in that:
proc = subprocess.Popen(['echo'] + args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, err = proc.communicate()
print(output)

Categories

Resources