write python script to run the shell script? - python

I run the svn status got the modified files :
svn status
? .settings
? .buildpath
? .directory
A A.php
M B.php
D html/C.html
M html/D.fr
M api/E.api
M F.php
..
After I want to keep all of these files
zcvf MY.tar.gz all files that svn stat display
(not include ? just M,A,D)
My idea is to create the python script can run the shell,because right now to do this I just copy the file name one by one.
zcvf MY.tar.gz all the files that we run svn stat
Anybody could guide how to do this or some related tutorial? But if you think it difficult than copy && paste I will ignore my trying?
thanks

I don't see why you would use python for this if you can do it in a single line of code in the shell.
svn status | grep "^[AMD]" | sed 's/^.\{8\}//' | xargs zcvf My.tar.gz
I used grep to only select lines that are modified, if you want all files that svn status lists (also those that are added / deleted) you can leave that part out. I've used sed here to delete the first part of every line, I'm sure there is an easier way to do that but I can't think of one right now.

Once you figure out your command as a string you can just call it with subprocess
subprocess
This module spawns called processes and allows you to control them. From there its up to you.

You could use check_output() and check_call() functions:
#!/usr/bin/env python
from subprocess import check_call, check_output as qx
filenames = [line[8:] # extract filename
for line in qx(['svn', 'status']).splitlines()
if not line.startswith('?')] # exclude files that are not under VC
check_call(['tar', 'cvfz', 'MY.tar.gz'] + filenames)
On Python < 2.7 see check_output() recipe.

subprocess is the Pythonic way, but using a small bash one-liner could be a shorter idea.
Bash one-liner
svn status | egrep "^ M" | awk '{s=s $2 " "} END {print "tar cvfz MY.tar "s}'
Subprocess
import subprocess as sp
p=sp.Popen('svn status', shell=True, stdout=sp.PIPE,
stderr=sp.PIPE).communicate()[0]
files=[]
for line in p:
if line.strip().find('M')==0:
files.append(line.split(' ')[1].strip())
files=' '.join(files)
sp.Popen('tar cvfz MY.tar.gz '+files, shell=True).communicate()

Related

How to correctly escape special characters in python subprocess?

Im trying to run this bash command using python subprocess
find /Users/johndoe/sandbox -iname "*.py" | awk -F'/' '{ print $NF}'
output:-
helld.xl.py
parse_maillog.py
replace_pattern.py
split_text_match.py
ssh_bad_login.py
Here is what i have done in python2.7 way, but it gives the output where awk command filter is not working
>>> p1=subprocess.Popen(["find","/Users/johndoe/sandbox","-iname","*.py"],stdout=subprocess.PIPE)
>>> p2=subprocess.Popen(['awk','-F"/"','" {print $NF} "'],stdin=p1.stdout,stdout=subprocess.PIPE)
>>>p2.communicate()
('/Users/johndoe/sandbox/argparse.py\n/Users/johndoe/sandbox/custom_logic_substitute.py\n/Users/johndoe/sandbox/finditer_html_parse.py\n/Users/johndoe/sandbox/finditer_simple.py\n/Users/johndoe/sandbox/group_regex.py\n/Users/johndoe/sandbox/helo.py\n/Users/johndoe/sandbox/newdir/helld.xl.py\n/Users/johndoe/sandbox/parse_maillog.py\n/Users/johndoe/sandbox/replace_pattern.py\n/Users/johndoe/sandbox/split_text_match.py\n/Users/johndoe/sandbox/ssh_bad_login.py\n', None)
I could also get output by using p1 alone here like below,but i cant get the awk working here
list1=[]
result=p1.communicate()[0].split("\n")
for item in res:
a=item.rstrip('/').split('/')
list1.append(a[-1])
print list1
You are incorrectly passing in shell quoting (and extra shell quoting which isn't even required by the shell!) when you're not invoking a shell. Don't do that.
p2=subprocess.Popen(['awk', '-F/', '{print $NF}'], stdin=...
When you have shell=True you need extra quotes around some arguments to protect them from the shell, but there is no shell here, so putting them in is incorrect, and will cause parse errors by Awk.
However, you should almost never need to call Awk from Python, especially for trivial tasks which Python can easily do natively:
list1 = [line.split('/')[-1]
for line in subprocess.check_output(
["find", "/Users/johndoe/sandbox",
"-iname", "*.py"]).splitlines()]
In this particular case, note also that GNU find already has a facility to produce this result directly:
list1 = subprocess.check_output(
["find", "/Users/johndoe/sandbox",
"-iname", "*.py", "-printf", "%f\\n"]).splitlines()
Use this: p2.communicate()[0].split("\n").
It will output a list of lines.
if you don't have any reservation using shell=True , then this should be pretty simple solution
from subprocess import Popen
import subprocess
command='''
find /Users/johndoe/sandbox -iname "*.py" | awk -F'/' '{ print $NF}'
'''
process=Popen(command,shell=True,stdout=subprocess.PIPE)
result=process.communicate()
print result

Converting a file from .sam to .bam using python subprocess

I would like to start out by saying any help is greatly appreciated. I'm new to Python and scripting in general. I am trying to use a program called samtools view to convert a file from .sam to a .bam I need to be able do what this BASH command is doing in Python:
samtools view -bS aln.sam > aln.bam
I understand that BASH commands like | > < are done using the subprocess stdin, stdout and stderr in Python. I have tried a few different methods and still can't get my BASH script converted correctly. I have tried:
cmd = subprocess.call(["samtools view","-bS"], stdin=open(aln.sam,'r'), stdout=open(aln.bam,'w'), shell=True)
and
from subprocess import Popen
with open(SAMPLE+ "."+ TARGET+ ".sam",'wb',0) as input_file:
with open(SAMPLE+ "."+ TARGET+ ".bam",'wb',0) as output_file:
cmd = Popen([Dir+ "samtools-1.1/samtools view",'-bS'],
stdin=(input_file), stdout=(output_file), shell=True)
in Python and am still not getting samtools to convert a .sam to a .bam file. What am I doing wrong?
Abukamel is right, but in case you (or others) are wondering about your specific examples....
You're not too far off with your first attempt, just a few minor items:
Filenames should be in quotes
samtools reads from a named input file, not from stdin
You don't need "shell=True" since you're not using shell tricks like redirection
So you can do:
import subprocess
subprocess.call(["samtools", "view", "-bS", "aln.sam"],
stdout=open('aln.bam','w'))
Your second example has more or less the same issues, so would need to be changed to something like:
from subprocess import Popen
with open('aln.bam', 'wb',0) as output_file:
cmd = Popen(["samtools", "view",'-bS','aln.sam'],
stdout=(output_file))
You can pass execution to the shell by kwarg 'shell=True'
subprocess.call('samtools view -bS aln.sam > aln.bam', shell=True)

python subprocess.popen redirect to create a file

I've been searching for how to do this without any success. I've inherited a python script for performing an hourly backup on our database. The original script is too slow, so I'm trying a faster utility. My new command would look like this if typed into a shell:
pg_basebackup -h 127.0.0.1 -F t -X f -c fast -z -D - --username=postgres > db.backup.tgz
The problem is that the original script uses call(cmd) and it fails if the above is the cmd string. I've been looking for how to modify this to use popen but cannot find any examples where a file create redirect is used as in
>. The pg_basebackup as shown will output to stdout. The only way I've succeeded so far is to change -D - to -D some.file.tgz and then move the file to the archive, but I'd rather do this in one step.
Any ideas?
Jay
May be like this ?
with open("db.backup.tgz","a") as stdout:
p = subprocess.Popen(cmd_without_redirector, stdout=stdout, stderr=stdout, shell=True)
p.wait()
Hmmm... The pg_basebackup executable must be able to attach to that file. If I open the file in the manner you suggest, I don't know the correct syntax in python to be able to do that. If I try putting either " > " or " >> " in the string to call with cmd(), python pukes on it. That's my real problem that I'm not finding any guidance on.

python or bash script to pass all files in a folder to java command line

I have the following Java command line working fine Mac os.
java -cp stanford-ner.jar edu.stanford.nlp.process.PTBTokenizer file.txt > output.txt
Multiple files can be passed as input with spaces as follows.
java -cp stanford-ner.jar edu.stanford.nlp.process.PTBTokenizer file1.txt file2.txt > output.txt
Now I have 100 files in a folder. All these files I have to pass as input to this command. I used
python os.system in a for loop of directories as follows .
for i,f in enumerate(os.listdir(filedir)):
os.system('java -cp "stanford-ner.jar" edu.stanford.nlp.process.PTBTokenizer "%s" > "annotate_%s.txt"' %(f,i))
This works fine only for the first file. But for all othe outputs like annotate_1,annotate_2 it creates only the file with nothing inside that. I thought of using for loop the files and pass it to subprocess.popen() , but that seems of no hope.
Now I am thinking of passing the files in a loop one by one to execute the command sequentially by passing each file in a bash script. I am also wondering whether I can parallely executes 10 files (atleast) in different terminals at a time. Any solution is fine, but I think this question will help me to gain some insights into different this.
If you want to do this from the shell instead of Python, the xargs tool can almost do everything you want.
You give it a command with a fixed list of arguments, and feed it input with a bunch of filenames, and it'll run the command multiple times, using the same fixed list plus a different batch of filenames from its input. The --max-args option sets the size of the biggest group. If you want to run things in parallel, the --max-procs option lets you do that.
But that's not quite there, because it doesn't do the output redirection. But… do you really need 10 separate files instead of 1 big one? Because if 1 big one is OK, you can just redirect all of them to it:
ls | xargs --max-args=10 --max-procs=10 java -cp stanford-ner.jar\
edu.stanford.nlp.process.PTBTokenizer >> output.txt
To pass all .txt files in the current directory at once to the java subprocess:
#!/usr/bin/env python
from glob import glob
from subprocess import check_call
cmd = 'java -cp stanford-ner.jar edu.stanford.nlp.process.PTBTokenizer'.split()
with open('output.txt', 'wb', 0) as file:
check_call(cmd + glob('*.txt'), stdout=file)
It is similar to running the shell command but without running the shell:
$ java -cp stanford-ner.jar edu.stanford.nlp.process.PTBTokenizer *.txt > output.txt
To run no more than 10 subprocesses at a time, passing no more than 100 files at a time, you could use multiprocessing.pool.ThreadPool:
#!/usr/bin/env python
from glob import glob
from multiprocessing.pool import ThreadPool
from subprocess import call
try:
from threading import get_ident # Python 3.3+
except ImportError: # Python 2
from thread import get_ident
cmd = 'java -cp stanford-ner.jar edu.stanford.nlp.process.PTBTokenizer'.split()
def run_command(files):
with open('output%d.txt' % get_ident(), 'ab', 0) as file:
return files, call(cmd + files, stdout=file)
all_files = glob('*.txt')
file_groups = (all_files[i:i+100] for i in range(0, len(all_files), 100))
for _ in ThreadPool(10).imap_unordered(run_command, file_groups):
pass
It is similar to this xargs command (suggested by #abarnert):
$ ls *.txt | xargs --max-procs=10 --max-args=100 java -cp stanford-ner.jar edu.stanford.nlp.process.PTBTokenizer >>output.txt
except that each thread in the Python script writes to its own output file to avoid corrupting the output due to parallel writes.
If you have 100 files, and you want to kick off 10 processes, each handling 10 files, all in parallel, that's easy.
First, you want to group them into chunks of 10. You can do this with slicing or with zipping iterators; in this case, since we definitely have a list, let's just use slicing:
files = os.listdir(filedir)
groups = [files[i:i+10] for i in range(0, len(files), 10)]
Now, you want to kick off process for each group, and then wait for all of the processes, instead of waiting for each one to finish before kicking off the next. This is impossible with os.system, which is one of the many reasons os.system says "The subprocess module provides more powerful facilities for spawning new processes…"
procs = [subprocess.Popen(…) for group in groups]
for proc in procs:
proc.wait()
So, what do you pass on the command line to give it 10 filenames instead of 1? If none of the names have spaces or other special characters, you can just ' '.join them. But otherwise, it's a nightmare. Another reason subprocess is better: you can just pass a list of arguments:
procs = [subprocess.Popen(['java', '-cp', 'stanford-ner.jar',
'edu.stanford.nlp.process.PTBTokenizer'] + group)
for group in groups]
But now how to do you get all of the results?
One way is to go back to using a shell command line with the > redirection. But a better way is to do it in Python:
procs = []
files = []
for i, group in enumerate(groups):
file = open('output_{}'.format(i), 'w')
files.append(file)
procs.append(subprocess.Popen([…same as before…], stdout=file)
for proc in procs:
proc.wait()
for file in files:
file.close()
(You might want to use a with statement with ExitStack, but I wanted to make sure this didn't require Python 2.7/3.3+, so I used explicit close.)
Inside your input file directory you can do the following in bash:
#!/bin/bash
for file in *.txt
do
input=$input" \"$file\""
done
java -cp stanford-ner.jar edu.stanford.nlp.process.PTBTokenizer $input > output.txt
If you want to run it as a script. Save the file with some name, my_exec.bash:
#!/bin/bash
if [ $# -ne 2 ]; then
echo "Invalid Input. Enter a directory and a output file"
exit 1
fi
if [ ! -d $1 ]; then
echo "Please pass a valid directory"
exit 1
fi
for file in $1*.txt
do
input=$input" \"$file\""
done
java -cp stanford-ner.jar edu.stanford.nlp.process.PTBTokenizer $input > $2
Make it an executable file
chmod +x my_exec.bash
USAGE:
./my_exec.bash <folder> <output_file>

Escaping escape sequence in Python

I am kind of new to python. Goal is to execute a shell command using subprocess parse & retrive the printed output from shell. The execution errors out as shown in the sample output msg below. Also shown below is the sample code snippet
Code snippet:
testStr = "cat tst.txt | grep Location | sed -e '/.*Location: //g' "
print "testStr = "+testStr
testStrOut = subprocess.Popen([testStr],shell=True,stdout=subprocess.PIPE).communicate()[0]
Output:
testStr = cat tst.txt | grep Location | sed -e '/.*Location: //g'
cat: tst.txt: No such file or directory
sed: -e expression #1, char 15: unknown command: `/'
Is there a workaround or a function that could be used ?
Appreciate your help
Thanks
I suppose your main error is not python related. To be more precise, there are 3 of them:
You forgot to import subprocess.
It should be sed -e 's/.*Location: //g'. You wrote ///g instead of s///g.
tst.txt does not exist.
You should be passing testStr directly as the first argument, rather than enclosing it in a list. See subprocess.Popen, the paragraph that starts "On Unix, with shell=True: ...".

Categories

Resources