Submitting jobs using python - python

I am trying to submit a job in a cluster in our institute using python scripts.
compile_cmd = 'ifort -openmp ran_numbers.f90 ' + fname \
+ ' ompscmf.f90 -o scmf.o'
subprocess.Popen(compile_cmd, shell=True)
Popen('qsub launcher',shell=True)
The problem is that , system is hanging at this point. Any obvious mistakes in the above script? All the files mentioned in the code are available in that directory ( I have cross checked that). qsub is a command used to submit jobs to our cluster. fname is the name of a file that I created in the process.

I have a script that I used to submit multiple jobs to our cluster using qsub. qsub typically takes job submissions in the form
qsub [qsub options] job
In my line of work, job is typically a bash (.sh) or python script (.py) that actually calls the programs or code to be run on each node. If I wanted to submit a job called "test_job.sh" with maximum walltime, I would do
qsub -l walltime=72:00:00 test_job.sh
This amounts to the following python code
from subprocess import call
qsub_call = "qsub -l walltime=72:00:00 %s"
call(qsub_call % "test_job.sh", shell=True)
Alternatively, what if you had a bash script that looked like
#!/bin/bash
filename="your_filename_here"
ifort -openmp ran_numbers.f90 $filename ompscmf.f90 -o scmf.o
then submitted this via qsub job.sh?
Edit: Honestly, the most optimal job queueing scheme varies from cluster to cluster. One simple way to simplify you job submissions scripts is to find out how many CPUs are available at each node. Some of the more recent queueing systems allow you to submit many single CPU jobs and they will submit these together on as few nodes as possible; however, some older clusters won't do that and submitting many individual jobs is frowned upon.
Say that each node in your cluster has 8 CPUs. You could write you script like
#!/bin/bash
#PBS -l nodes=1;ppn=8
for ((i=0; i<8; i++))
do
./myjob.sh filename_${i} &
done
wait
What this will do is submit 8 jobs on one node at once (& means do in background) and wait until all 8 jobs are finished. This may be optimal for clusters with many CPUs per node (for example, one cluster that I used has 48 CPUs per node).
Alternatively, if submitting many single core jobs is optimal and your submission code above isn't working, you could use python to generate bash scripts to pass to qsub.
#!/usr/bin/env python
import os
from subprocess import call
bash_lines = ['#!/bin/bash\n', '#PBS -l nodes=1;ppn=1\n']
bash_name = 'myjob_%i.sh'
job_call = 'ifort -openmp ran_numbers.f90 %s ompscmf.f90 -o scmf.o &\n'
qsub_call = 'qsub myjob_%i.sh'
filenames = [os.path.join(root, f) for root, _, files in os.walk(directory)
for f in files if f.endswith('.txt')]
for i, filename in enumerate(filenames):
with open(bash_name%i, 'w') as bash_file:
bash_file.writelines(bash_lines + [job_call%filename, 'wait\n'])
call(qsub_call%i, shell=True)

Did you get any errors. Because it seems you missed the "subprocess." at the second Popen.

Related

SLURM: how to run the same python script for different $arg from a catalogue in parallel

I have to run a series of python scripts for about 10'000 objects. Each object is characterised by arguments in a row of my catalogue.
On my computer, to test the scripts, I was simply using a bash file like:
totrow=`wc -l < catalogue.txt`
for (( i =1; i <= ${totrow}; i++ )); do
arg1=$(awk 'NR=='${i}' ' catalogue.txt)
arg2=$(awk 'NR=='${i}'' catalogue.txt)
arg3=$(awk 'NR=='${i}'' catalogue.txt)
python3 script1.py ${arg1} ${arg2} ${arg3}
done
that runs the script for each row of the catalogue.
Now I want to run everything on a supercomputer (with a slurm system).
What I would like to do, it is running e.g. 20 objects on 20 cpus at the same time (so 20 rows at the same time) and go on in this way for the entire catalogue.
Any suggestions?
Thanks!
You could set this up as an array job. Put the inner part of your loop into a something.slurm file, and set i equal to the array element ID ($SLURM_ARRAY_TASK_ID) at the top of this file (a .slurm file is just a normal shell script with job information encoded in comments). Then use sbatch array=1-$totrow something.slurm to launch the jobs.
This will schedule each Python call as a separate task, and number them from 1 to $totrow. SLURM will run each of them on the next available CPU, possibly all at the same time.

How to execute multiple bash commands in parallel in python

So, I have a code which takes in input and starts a spark job in cluster.. So, something like
spark-submit driver.py -i input_path
Now, I have list of paths and I want to execute all these simulatenously..
Here is what I tried
base_command = 'spark-submit driver.py -i %s'
for path in paths:
command = base_command%path
subprocess.Popen(command, shell=True)
My hope was, all of the shell commands would be executed simultaneously but instead, I am noticing that it executes one command at a time..
How do i execute all the bash commands simultaneously.
Thanks
This is where pool comes in, it is designed for just this case. It maps many inputs to many threads automatically. Here is a good resource on how to use it.
from multiprocessing import Pool
def run_command(path):
command = "spark-submit driver.py -i {}".format(path)
subprocess.Popen(command, shell=True)
pool = Pool()
pool.map(run_command, paths)
It will create a thread for every item in paths and, run them all at the same time for the given input

Running external commands partly in parallel from python (or bash)

I am running a python script which creates a list of commands which should be executed by a compiled program (proprietary).
The program kan split some of the calculations to run independently and the data will then be collected afterwards.
I would like to run these calculations in parallel as each are a very time consuming single threaded task and I have 16 cores available.
I am using subprocess to execute the commands (in Class environment):
def run_local(self):
p = Popen(["someExecutable"], stdout=PIPE, stdin=PIPE)
p.stdin.write(self.exec_string)
p.stdin.flush()
while(p.poll() is not none):
line = p.stdout.readline()
self.log(line)
Where self.exec_string is a string of all the commands.
This string an be split into: an initial part, the part i want parallelised and a finishing part.
How should i go about this?
Also it seems the executable will "hang" (waiting for a command, eg. "exit" which will release the memory) if a naive copy-paste of the current method is used for each part.
Bonus: The executable also has the option to run a bash script of commands, if it is easier/possible to parallelise bash?
For bash, it could be very simple. Assuming your file looks like this:
## init part##
ls
cd ..
ls
cat some_file.txt
## parallel ##
heavycalc &
heavycalc &
heavycalc &
## finish ##
wait
cat results.txt
With & behind the command you tell bash to run this command in a background-thread. wait will then wait for all background-threads to finish, so you can be sure, all calculations are done.
I've assumed your input txt-file are plain bash-commands.
Using GNU Parallel:
## init
cd foo
cp bar baz
## parallel ##
parallel heavycalc ::: file1 file2 file3 > results.txt
## finish ##
cat results.txt
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Launch a single python script as different processes differing by command line arguments

I have python script that takes command line arguments. The way I get the command line arguments is by reading a mongo database. I need to iterate over the mongo query and launch a different process for the single script with different command line arguments from the mongo query.
Key is, I need the launched processes to be:
separate processes share nothing
when killing the process, I need to be able to kill them all easily.
I think the command killall -9 script.py would work and satisfies the second constraint.
Edit 1
From the answer below, the launcher.py program looks like this
def main():
symbolPreDict = initializeGetMongoAllSymbols()
keys = sorted(symbolPreDict.keys())
for symbol in keys:
# Display key.
print(symbol)
command = ['python', 'mc.py', '-s', str(symbol)]
print command
subprocess.call(command)
if __name__ == '__main__':
main()
The problem is that mc.py has a call that blocks
receiver = multicast.MulticastUDPReceiver ("192.168.0.2", symbolMCIPAddrStr, symbolMCPort )
while True:
try:
b = MD()
data = receiver.read() # This blocks
...
except Exception, e:
print str(e)
When I run the launcher, it just executes one of the mc.py (there are at least 39). How do I modify the launcher program to say "run the launched script in background" so that the script returns to the launcher to launch more scripts?
Edit 2
The problem is solved by replacing subprocess.call(command) with subprocess.Popen(command)
One thing I noticed though, if I say ps ax | grep mc.py, the PID seem to be all different. I don't think I care since I can kill them all pretty easily with killall.
[Correction] kill them with pkill -f xxx.py
There are several options for launching scripts from a script. The easiest are probably to use the subprocess or os modules.
I have done this several times to launch things to separate nodes on a cluster. Using os it might look something like this:
import os
for i in range(len(operations)):
os.system("python myScript.py {:} {:} > out.log".format(arg1,arg2))
using killall you should have no problem terminating processes spawned this way.
Another option is to use subprocess which has got a wide range of features and is much more flexible than os.system. An example might look like:
import subprocess
for i in range(len(operations)):
command = ['python','myScript.py','arg1','arg2']
subprocess.call(command)
In both of these methods, the processes are independent and share nothing other than a parent PID.

How to structure code that distributes jobs to threads/nodes in Python?

I have python code that takes a bunch of tasks and distributes them to either different threads or different nodes on a cluster. I always end up writing a main script driver.py, that takes two command line arguments: --run-all and --run-task. The first is just a wrapper that iterates through all tasks and then calls driver.py --run-task with each task passed as argument. Example:
== driver.py ==
# Determine the current script
DRIVER = os.path.abspath(__file__)
(opts, args) = parser.parse_args()
if opts.run_all is not None:
# Run all tasks
for task in opts.run_all.split(","):
# Call driver.py again with a specific task
cmd = "python %s --run-task %s" %(DRIVER, task)
# Execute on system
distribute_cmd(cmd)
elif opts.run_task is not None:
# Run on an individual task
# code here for processing a task...
The user would then call:
$ driver.py --run-all task1,task2,task3,task4
And each task would get distributed.
The function distribute_cmd takes a shell executable command and sends in a system-specific way to either a node or a thread. The reason driver.py has to find its own name and call itself is because distribute_cmd needs an executable shell command; it cannot take a function name for example.
This consideration led me to this design, of a driver script having two modes and having to call itself. This has two complications: (1) the script has to find out its own path via __file__ and (2) when making this into a Python package, it's unclear where driver.py should go. It's meant to be an executable scripts, but if I put it in setup.py's scripts=, then I will have to find out where the scripts live (see correct way to find scripts directory from setup.py in Python distutils?). This does not seem to be a good solution.
What's an alternative design to this? Keep in mind that the distribution of tasks has to result in an executable command that can be passed as a string to distribute_cmd. thanks.
are you looking for is a library that already does exactly what you need, e.g. Fabric or Celery.
if you were not using nodes, I would suggest using multiprocessing.
this is a slightly similar question to this one
To be able to execute remotely, you either need:
ssh access to the box, in that case you can use Fabric to send your commands.
a server, SocketServer, tcp server, or anything that will accept connections.
an agent, or client, that will wait for data, if you are using a agent, you may as well use a broker for your messages. Celery allows you to do some of the plumbing, one end puts messages on the queue while the other end gets message from the queue. If the message is a command to execute, then the agent can do an os.system() call, or call subprocess.Popen()
celery example:
import os
from celery import Celery
celery = Celery('tasks', broker='amqp://guest#localhost//')
#celery.task
def run_command(command):
return os.system(command)
You will then need a worker that binds on the queue and waits for tasks to execute. More info in the documentation.
fabric example:
the code:
from fabric.api import run
def exec_remotely(command):
run(command)
the invocation:
$ fab exec_remotely:command='ls -lh'
More info in the documentation.
batch system case:
To go back to the question...
distribute_cmd is something that would call bsub somescript.sh
you need to find file only because you are going to re-execute the same script with other parameters
because of the above, you might have a problem providing a correct distutils script.
Let's question this design.
Why do you need to use the same script?
Can your driver write scripts then call bsub?
Can you use temporary files?
Do all the nodes actually share a filesystem?
How do you know file is going to exist on the node?
example:
TASK_CODE = {
'TASK1': '''#!/usr/bin/env python
#... actual code for task1 goes here ...
''',
'TASK2': '''#!/usr/bin/env python
#... actual code for task2 goes here ...
'''}
# driver portion
(opts, args) = parser.parse_args()
if opts.run_all is not None:
for task in opts.run_all.split(","):
task_path = '/tmp/taskfile_%s' % task
with open(task_path, 'w') as task_file:
task_file.write(TASK_CODE[task])
# note: should probably do better error handling.
distribute_cmd(task_path)

Categories

Resources