How to execute a python script in bash in parallel - python

I am trying to run a bash script multiple times on a cluster. The issue however is I need to grab certain file names to fill the command which I only know how to do via python.
Note:I want to run the the last line (the line that calls the script) in parallel in groups of like two. How can I do this?
I have thought of: outputting all commands to a .txt and catting that in parallel. However, I feel that it is not the most efficient.
Thank you for any help
The script looks like this:
#!/usr/bin/python
import os
import sys
cwd = os.getcwd()
for filename in os.listdir(cwd):
if "_2_" in filename:
continue
elif "_1_" in filename:
in1 = os.path.join(cwd, filename)
secondread = filename.replace("_1.fastq_1_trimmed.fq","_1.fastq_2_trimmed.fq")
in2 = os.path.join(cwd, secondread)
outrename = filename.replace("_1.fastq_1_trimmed.fq",".bam")
out = "/home/blmatsum/working/bamout/" + outrename
cmd = "bbmap.sh ref=/home/blmatsum/working/datafiles/sequence.phages_clustered.fna in={} in2={} out={}".format(in1,in2,out)
os.system(cmd)
an example of the command I want to run would be:
bbmap.sh ref=/home/working/datafiles/sequence.phages_clustered.fna in=/home/working/trimmed/SRR7077355_1.fastq_1_trimmed.fq in2=/home/working/trimmed/SRR7077355_1.fastq_2_trimmed.fq out=/home/working/bamout/SRR7077355.bam'
bbmap.sh ref=/home/working/datafiles/sequence.phages_clustered.fna in=/home/working/trimmed/SRR7077366_1.fastq_1_trimmed.fq in2=/home/working/trimmed/SRR7077366_1.fastq_2_trimmed.fq out=/home/working/bamout/SRR7077366.bam

Related

How to run a program in a python IDE given the file path to the program

I have a program that I need to run in a list of folders. I have the file path to where the program is and the list to all of the folders where I would like to run the program on my computer (there's about 200 of them). I can also change the current working directory to get to the folder I want to be in.
How do you get python to execute a program through the actual IDE once you are in the folder you want to run the program in? I don't want to have to manually open a terminal on my computer, type "[Program name] [Path to file where I want to run the program]" 200+ times. The code I have is below
cat = '/home/myname/catalogue.csv'
cat = Table.read(cat, format="ascii")
ID = np.array(cat['ID'])
ID = ID.astype(str)
folder_path = np.array([])
for i in ID:
folder_path = np.append(folder_path, '/home/myname/python_stuff/{}/'.format(i))
folder_path = folder_path[folder_path.argsort()]
for i in zip(folder_path, ID):
os.chdir(i[0])
name = i[1] +(".setup")
Essentially after the last line in my code I want another line that is the python equivalent of "run [program name] on name (which is the name of the file in each folder I want it to use)
You can run process as a subprocess with changing working directory
import subprocess
p = subprocess.Popen(["python","some_script.py"], stdout=subprocess.PIPE, cwd=PATH)
output = p.communicate()[0]
cwd means current working directory so you can run your script in another script on specific directory.
https://docs.python.org/3/library/subprocess.html#popen-constructor Check this guide to run subprocess with different directory
import subprocess
for i in zip(folder_path, ID):
os.chdir(i[0])
name = i[1] +(".setup")
subprocess.call("[path to program] %s" % (name) , shell=True)

Subprocess.run() inside loop

I would like to loop over files using subprocess.run(), something like:
import os
import subprocess
path = os.chdir("/test")
files = []
for file in os.listdir(path):
if file.endswith(".bam"):
files.append(file)
for file in files:
process = subprocess.run("java -jar picard.jar CollectHsMetrics I=file", shell=True)
How do I correctly call the files?
shell=True is insecure if you are including user input in it. #eatmeimadanish's answer allows anybody who can write a file in /test to execute arbitrary code on your machine. This is a huge security vulnerability!
Instead, supply a list of command-line arguments to the subprocess.run call. You likely also want to pass in check=True – otherwise, your program would finish without an exception if the java commands fails!
import os
import subprocess
os.chdir("/test")
for file in os.listdir("."):
if file.endswith(".bam"):
subprocess.run(
["java", "-jar", "picard.jar", "CollectHsMetrics", "I=" + file], check=True)
Seems like you might be over complicating it.
import os
import subprocess
path = os.chdir("/test")
for file in os.listdir(path):
if file.endswith(".bam"):
subprocess.run("java -jar picard.jar CollectHsMetrics I={}".format(file), shell=True)

Read argument with spaces from windows cmd Python

I'm trying to read arguments with spaces in windows cmd.
So here is the code.
from avl_tree import *
import sys,os
if __name__ == '__main__':
avl = AVLTreeMap()
infile = sys.argv[1] + '.txt'
avl._preprocessing(infile)
avl._interface(infile)
I've written it as sys.argv[1] since I'm gonna type in the cmd as following:
python filename.py textfilename
But then if the text file has spaces in the name it won't work like that.
Any suggestions?
Thanks in advance.
This is a very hacky fix, and I wouldn't necessarily suggest it because it will mess with other arguments you might need to add later but you could do something like this:
infile = " ".join(sys.argv[1:]) + '.txt'
So you if you run the program like this:
python filename.py my file name
infile will equal "my file name.txt"

Need a script to iterate over files and execute a command

Please bear with me, I've not used python before, and I'm trying to get some rendering done as quick as possible and getting stopped in my tracks with this.
I'm outputting the .ifd files to a network drive (Z:), and they are stored in a folder structure like;
Z:
- \0001
- \0002
- \0003
I need to iterate over the ifd files within a single folder, but the number of files is not static so there also needs to be a definable range (1-300, 1-2500, etc). The script therefore has to be able to take an additional two arguments for a start and end range.
On each iteration it executes something called 'mantra' using this statement;
mantra -f file.FRAMENUMBER.ifd outputFile.FRAMENUMBER.png
I've found a script on the internet that is supposed to do something similar;
import sys, os
#import command line args
args = sys.argv
# get args as string
szEndRange = args.pop()
szStartRange = args.pop()
#convert args to int
nStartRange = int(szStartRange, 10);
nEndRange = int(szEndRange, 10);
nOrd = len(szStartRange);
#generate ID range
arVals = range(nStartRange, nEndRange+1);
for nID in arVals:
szFormat = 'mantra -V a -f testDebris.%%(id)0%(nOrd)dd.ifd' % {"nOrd": nOrd};
line = szFormat % {"id": nID};
os.system(line);
The problem I'm having is that I can't get it to work. It seems to iterate, and do something - but it looks like it's just spitting out ifds into a different folder somewhere.
TLDR;
I need a script which will at least take two arguments;
startFrame
endFrame
and from those create a frameRange, which is then used to iterate over all ifd files executing the following command;
mantra -f fileName.currentframe.ifd fileName.currentFrame.png
If I were able to specify the filename and the files directory and output directory that'd be great too. I've tried manually doing that but there must be some convention to that I don't know as it was coming up with errors when I tried (stopping at the colon).
If anyone could hook me up or point me in the right direction that'd be swell. I know I should try and learn python, but I'm at my wits end with the rendering and need a helping hand.
import os, subprocess, sys
if len(sys.argv) != 3:
print('Must have 2 arguments!')
print('Correct usage is "python answer.py input_dir output_dir" ')
exit()
input_dir = sys.argv[1]
output_dir = sys.argv[2]
input_file_extension = '.txt'
cmd = 'currentframe'
# iterate over the contents of the directory
for f in os.listdir(input_dir):
# index of last period in string
fi = f.rfind('.')
# separate filename from extension
file_name = f[:fi]
file_ext = f[fi:]
# create args
input_str = '%s.%s.ifd' % (os.path.join(input_dir, file_name), cmd)
output_str = '%s.%s.png' % (os.path.join(output_dir + file_name), cmd)
cli_args = ['mantra', '-f', input_str, output_str]
#call function
if subprocess.call(cli_args, shell=True):
print('An error has occurred with command "%s"' % ' '.join(cli_args))
This should be sufficient for you to either use currently or with slight modification.
Instead of specifically inputting a start and end range you could just do:
import os
path, dirs, files = os.walk("/Your/Path/Here").next()
nEndRange = len(files)
#generate ID range
arVals = range(1, nEndRange+1);
The command os.walk() counts the # of files in the folder that you specified.
Although, an even easier way of getting your desired output is like this:
import os
for filename in os.listdir('dirname'):
szFormat = 'mantra -f ' + filename + ' outputFile.FRAMENUMBER.png'
line = szFormat % {"id": filename}; # you might need to play around with this formatting
os.system(line);
Because os.listdir() iterates through the specified directory and filename is every file in that directory, so you don't even need to count them.
a little help building the command.
for nID in arVals:
command = 'mantra -V a -f '
infile = '{0}.{1:04d}.ifd '.format(filename, id)
outfile = '{0}.{1:04d}.png '.format(filename, id)
os.system(command + infile + outfile);
and definitely use os.walk or os.listdir like #logic recommends
for file in os.listdir("Z:"):
filebase = os.path.splitext(file)[0]
command = 'mantra -V a -f {0}.ifd {0}.png'.format(filebase)

Run command in python

I am writing a python script which will iterate a directory and run a command based on the file,like this:
for root, directories,files in os.walk(directory):
for filename in files:
if filename.endswith('.xx'):
filepath=os.path.join(root,filename)
process_file(filepath)
def process_file(filepath):
#get the file name and extension
name,ext=os.path.splitext(os.path.basename(filepath))
#copy the file to a tmp file, because the filepath may contain some no-asci character
tmp_file_src='c:/'+uuid.uuid1()+'.xx'
tmp_file_dest=tmp_file_src.replace('.xx','.yy')
shutil.copy2(filepath,tmp_file_src)
#run the command which may take 10-20 seconds
os.system('xx %s %s' %(tmp_file_src,tmp_file_dest))
#copy the generated output file, reset the file name
shutil.copy2(tmp_file_dest,os.path.dirname(filepath)+'/'+name+'.yy')
As you can see, one file to one command, and I have to wait the command run completely to do the further job.
Not the execute process:
file1-->file2.....
I wonder if they can be executed parallelly?
file1
file2
....
There is even the possibility to use threading module.
import threading
def starter_function(cmd_to_execute):
os.system(cmd_to_execute)
execution_thread = threading.Thread(target=starter_function, args=(cmd_to_execute,))
execution_thread.start()

Categories

Resources