Storing bash commands in python list - python

I want to run a bash command and append it's output in a list. I tried
import commands
v = commands.getstatusoutput("ls")
but v will be an output of (0,"file1\nfile2\ndir1\n")
I could process the 1st (0th based) element in v but that means I would have to scan through the string. I was wondering if there was a more pythonic way to approach this. I do not have to commands. I am open to other ways to perform the same task

In this example, you only have to split the string on the newline character:
output = v[1].split("\n")
In general, you should use the Python equivalent where possible instead of calling out to the shell:
files = os.listdir(".")
For shell commands without a Python equivalent, look at the subprocess module, but
having to parse the output is not non-Pythonic, it's just something you'll have to do.

Related

subprocess.call with command having embedded spaces and quotes

I would like to retrieve output from a shell command that contains spaces and quotes. It looks like this:
import subprocess
cmd = "docker logs nc1 2>&1 |grep mortality| awk '{print $1}'|sort|uniq"
subprocess.check_output(cmd)
This fails with "No such file or directory". What is the best/easiest way to pass commands such as these to subprocess?
The absolutely best solution here is to refactor the code to replace the entire tail of the pipeline with native Python code.
import subprocess
from collections import Counter
s = subprocess.run(
["docker", "logs", "nc1"],
text=True, capture_output=True, check=True)
count = Counter()
for line in s.stdout.splitlines():
if "mortality" in line:
count[line.split()[0]] += 1
for count, word in count.most_common():
print(count, word)
There are minor differences in how Counter objects resolve ties (if two words have the same count, the one which was seen first is returned first, rather than by sort order), but I'm guessing that's unimportant here.
I am also ignoring standard output from the subprocess; if you genuinely want to include output from error messages, too, just include s.stderr in the loop driver too.
However, my hunch is that you don't realize your code was doing that, which drives home the point nicely: Mixing shell script and Python raises the mainainability burden, because now you have to understand both shell script and Python to understand the code.
(And in terms of shell script style, I would definitely get rid of the useless grep by refactoring it into the Awk script, and probably also fold in the sort | uniq which has a trivial and more efficient replacement in Awk. But here, we are replacing all of that with Python code anyway.)
If you really wanted to stick to a pipeline, then you need to add shell=True to use shell features like redirection, pipes, and quoting. Without shell=True, Python looks for a command whose file name is the entire string you were passing in, which of course doesn't exist.

Import multiple files output from bash script into Python lists

I have a bash script that connects to multiple compute nodes and pulls data from each one depending on some arguments entered after the bash script is called. For simplicity sake, I'm essentially doing this:
for h in node{0..7}; do ssh $h 'fold -w 80 /program.headers | grep "RA"
| head -600 | tr -d "RA =" > '$h'filename'; done
I'm trying to take the 8 files that come out of this (each have 600 pieces of information) and save them each as a list in Python. I then need to manipulate them in Python (split and convert to float) to be able to plot the data with Matplotlib.
For a bash script that only outputs one file, I can easily make a variable name equal to check_output and then manipulate from there:
test = subprocess.check_output("./bashscript")
test_list = test.split()
test = [float(a) for a in test_list]
I am also able to read a saved file from my bash script by using:
test = subprocess.check_output(['cat', '/path/filename'])
test_list = test.split()
test = [float(a) for a in test_list]
The problem is, I'm working with over 80 files after I get all that I need. Is there some way in Python to say, "for every file made store the contents of that as a list"?
Instead of capturing data by using subprocess you can use os.popen() to execute scripts. The benefit of using it is that you can read the output of a command/script as you are reading a file. So you can use read(), readlines(),readline() according to your wish which all will give you a list. By using that you can execute the script and capture output like this
import os
output=os.popen("./bashscript").readlines() #now output has the op of bashsceipt with each line as a seperate item as list.
check this for more info on how to use os.popen(). check this to know difference between read(),readlines(),readline(),xreadlines()
Define a simple interface between your bash script and your python script
It looks like the simple interface used to be a print out of the file, but this solution did not scale to multiple files. Now, I recommend the interface be printing out the names of files created. It would look something like this:
filenames = subprocess.check_output("./bashscript").split()
for filename in filenames:
with open(filename) as file_obj:
file_data = [float(a) for a in file_obj.readlines()]
It looks like you are unfamiliar with Python but are familiar with bash. As a result, you are programming hobbled on bash crutches, instead you should embrace Python and use it in your application. You probably do not need the bash script at all.

Passing piped data to Python program and also an input file

I want to pass data to a Python file using a pipe and also specifying an input file like:
cat file.txt|python script.py -u configuration.txt
I currently have this:
for line in fileinput.input(mode='rU'):
print(line)
I know there can be something with sys.argv but maybe using fileinput there is a clean way to do it?
Thanks.
From the documentation:
If a filename is '-', it is also replaced by sys.stdin. To specify an alternative list of filenames, pass it as the first argument to input().
So you can create a list containing '-' as well as the contents of sys.argv[1:] (the default), and pass that to input(). Or alternatively just put - in the list of arguments of your Python program:
cat file.txt|python script.py -u - configuration.txt
or
cat file.txt|python script.py -u configuration.txt -
depending on whether you want data provided on standard input to be processed before or after the contents of configuration.txt.
If you want to do anything more complicated than just processing the contents of standard input as if it were an input file, you probably should not be using the fileinput module.

Running grep through Python - doesn't work

I have some code like this:
f = open("words.txt", "w")
subprocess.call(["grep", p, "/usr/share/dict/words"], stdout=f)
f.close()
I want to grep the MacOs dictionary for a certain pattern and write the results to words.txt. For example, if I want to do something like grep '\<a.\>' /usr/share/dict/words, I'd run the above code with p = "'\<a.\>'". However, the subprocess call doesn't seem to work properly and words.txt remains empty. Any thoughts on why that is? Also, is there a way to apply regex to /usr/share/dict/words without calling a grep-subprocess?
edit:
When I run grep '\<a.\>' /usr/share/dict/words in my terminal, I get words like: aa
ad
ae
ah
ai
ak
al
am
an
ar
as
at
aw
ax
ay as results in the terminal (or a file if I redirect them there). This is what I expect words.txt to have after I run the subprocess call.
Like #woockashek already commented, you are not getting any results because there are no hits on '\<a.\>' in your input file. You are probably actually hoping to find hits for \<a.\> but then obviously you need to omit the single quotes, which are messing you up.
Of course, Python knows full well how to look for a regex in a file.
import re
rx = re.compile(r'\ba.\b')
with open('/usr/share/dict/words', 'Ur') as reader, open('words.txt', 'w') as writer:
for line in reader:
if rx.search(line):
print(line, file=writer, end='')
The single quotes here are part of Python's string syntax, just like the single quotes on the command line are part of the shell's syntax. In neither case are they part of the actual regular expression you are searching for.
The subprocess.Popen documentation vaguely alludes to the frequently overlooked fact that the shell's quoting is not necessary or useful when you don't have shell=True (which usually you should avoid anyway, for this and other reasons).
Python unfortunately doesn't support \< and \> as word boundary operators, so we have to use (the functionally equivalent) \b instead.
The standard input and output channels for the process started by call() are bound to the parent’s input and output. That means the calling programm cannot capture the output of the command. Use check_output() to capture the output for later processing:
import subprocess
f = open("words.txt", "w")
output = subprocess.check_output(['grep', p ,'-1'])
file.write(output)
print output
f.close()
PD: I hope it works, i cant check the answer because i have not MacOS to try it.

running BLAST (bl2seq) without creating sequence files

I have a script that performs BLAST queries (bl2seq)
The script works like this:
Get sequence a, sequence b
write sequence a to filea
write sequence b to fileb
run command 'bl2seq -i filea -j fileb -n blastn'
get output from STDOUT, parse
repeat 20 million times
The program bl2seq does not support piping.
Is there any way to do this and avoid writing/reading to the harddrive?
I'm using Python BTW.
Depending on what OS you're running on, you may be able to use something like bash's process substitution. I'm not sure how you'd set that up in Python, but you're basically using a named pipe (or named file descriptor). That won't work if bl2seq tries to seek within the files, but it should work if it just reads them sequentially.
How do you know bl2seq does not support piping.? By the way, pipes is an OS feature, not the program. If your bl2seq program outputs something, whether to STDOUT or to a file, you should be able to parse the output. Check the help file of bl2seq for options to output to file as well, eg -o option. Then you can parse the file.
Also, since you are using Python, an alternative you can use is BioPython module.
Is this the bl2seq program from BioPerl? If so, it doesn't look like you can do piping to it. You can, however, code your own hack using Bio::Tools::Run::AnalysisFactory::Pise, which is the recommended way of going about it. You'd have to do it in Perl, though.
If this is a different bl2seq, then disregard the message. In any case, you should probably provide some more detail.
Wow. I have it figured out.
The answer is to use python's subprocess module and pipes!
EDIT: forgot to mention that I'm using blast2 which does support piping.
(this is part of a class)
def _query(self):
from subprocess import Popen, PIPE, STDOUT
pipe = Popen([BLAST,
'-p', 'blastn',
'-d', self.database,
'-m', '8'],
stdin=PIPE,
stdout=PIPE)
pipe.stdin.write('%s\n' % self.sequence)
print pipe.communicate()[0]
where self.database is a string containing the database filename, ie 'nt.fa'
self.sequence is a string containing the query sequence
This prints the output to the screen but you can easily just parse it. No slow disk I/O. No slow XML parsing. I'm going to write a module for this and put it on github.
Also, I haven't gotten this far yet but I think you can do multiple queries so that the blast database does not need to be read and loaded into RAM for each query.
I call blast2 using R script:
....
system("mkfifo seq1")
system("mkfifo seq2")
system("echo sequence1 > seq1"), wait = FALSE)
system("echo sequence2 > seq2"), wait = FALSE)
system("blast2 -p blastp -i seq1 -j seq2 -m 8", intern = TRUE)
....
This is 2 times slower(!) vs. writing and reading from hard drive!

Categories

Resources