Running external commands partly in parallel from python (or bash) - python

I am running a python script which creates a list of commands which should be executed by a compiled program (proprietary).
The program kan split some of the calculations to run independently and the data will then be collected afterwards.
I would like to run these calculations in parallel as each are a very time consuming single threaded task and I have 16 cores available.
I am using subprocess to execute the commands (in Class environment):
def run_local(self):
p = Popen(["someExecutable"], stdout=PIPE, stdin=PIPE)
p.stdin.write(self.exec_string)
p.stdin.flush()
while(p.poll() is not none):
line = p.stdout.readline()
self.log(line)
Where self.exec_string is a string of all the commands.
This string an be split into: an initial part, the part i want parallelised and a finishing part.
How should i go about this?
Also it seems the executable will "hang" (waiting for a command, eg. "exit" which will release the memory) if a naive copy-paste of the current method is used for each part.
Bonus: The executable also has the option to run a bash script of commands, if it is easier/possible to parallelise bash?

For bash, it could be very simple. Assuming your file looks like this:
## init part##
ls
cd ..
ls
cat some_file.txt
## parallel ##
heavycalc &
heavycalc &
heavycalc &
## finish ##
wait
cat results.txt
With & behind the command you tell bash to run this command in a background-thread. wait will then wait for all background-threads to finish, so you can be sure, all calculations are done.
I've assumed your input txt-file are plain bash-commands.

Using GNU Parallel:
## init
cd foo
cp bar baz
## parallel ##
parallel heavycalc ::: file1 file2 file3 > results.txt
## finish ##
cat results.txt
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Related

SLURM: how to run the same python script for different $arg from a catalogue in parallel

I have to run a series of python scripts for about 10'000 objects. Each object is characterised by arguments in a row of my catalogue.
On my computer, to test the scripts, I was simply using a bash file like:
totrow=`wc -l < catalogue.txt`
for (( i =1; i <= ${totrow}; i++ )); do
arg1=$(awk 'NR=='${i}' ' catalogue.txt)
arg2=$(awk 'NR=='${i}'' catalogue.txt)
arg3=$(awk 'NR=='${i}'' catalogue.txt)
python3 script1.py ${arg1} ${arg2} ${arg3}
done
that runs the script for each row of the catalogue.
Now I want to run everything on a supercomputer (with a slurm system).
What I would like to do, it is running e.g. 20 objects on 20 cpus at the same time (so 20 rows at the same time) and go on in this way for the entire catalogue.
Any suggestions?
Thanks!
You could set this up as an array job. Put the inner part of your loop into a something.slurm file, and set i equal to the array element ID ($SLURM_ARRAY_TASK_ID) at the top of this file (a .slurm file is just a normal shell script with job information encoded in comments). Then use sbatch array=1-$totrow something.slurm to launch the jobs.
This will schedule each Python call as a separate task, and number them from 1 to $totrow. SLURM will run each of them on the next available CPU, possibly all at the same time.

Parallel SGE queue does not execute python code

I am currently using a cluster which uses the SGE. There, I am submitting a .sh script which calls a python script (which is multithreaded by using multiprocessing.pool), to a parallel queue by calling qsub run.sh. The python script itself is printing some kind of progress via print(...). This is then appearing in the output file which is created by the SGE. Now there is a huge problem: When I execute the script manually everything works like a charm, but when I use the parallel queue at some (random) iteration the pool worker seems to stop working, as no further progress can be seen in the output file. Futhermore, the CPU usage suddenly drops to 0% and all threads of the script are just idling.
What can I do to solve this problem? Or how can I even debug it? As there are no error messages in the output file, I am really confused.
Edit: Here are some parts of the shell script which is added to the q and the necessary python files.
main.sh:
#!/bin/bash
# Use python as shell
#$ -S /bin/bash
# Preserve environment variables
#$ -V
# Execute from current working directory
#$ -cwd
# Merge standard output and standard error into one file
#$ -j yes
# Standard name of the job (if none is given on the command line):
#$ -N vh_esn_gs
# Path for the output files
#$ -o /home/<username>/q-out/
# Limit memory usage
#$ -hard -l h_vmem=62G
# array range
#$ -t 1-2
# parallel
#$ -pe <qname> 16
#$ -q <qname>
python mainscript.py
mainscript.py:
#read parameters etc [...]
def mainFunction():
worker = ClassWorker(...)
worker.startparallel()
if __name__== '__main__':
mainFunction()
whereby the ClassWorker is defined like this:
class ClassWorker:
def _get_score(data):
params, fixed_params, trainingInput, trainingOutput, testingDataSequence, esnType = data
[... (the calculation is perfomed)]
dat = (test_mse, training_acc, params)
ClassWorker._get_score.q.put(dat)
return dat
def _get_score_init(q):
ClassWorker._get_score.q = q
def startparallel():
queue = Queue()
pool = Pool(processes=n_jobs, initializer=ClassWorker._get_score_init, initargs=[queue,] )
[...(setup jobs)]
[start async thread to watch for incoming results in the q to update the progress]
results = pool.map(GridSearchP._get_score, jobs)
pool.close()
Maybe this helps to spot the problem. I did not include the real calculation part as this has not caused any trouble on the cluster so far, so this shoul be safe.

Submitting jobs using python

I am trying to submit a job in a cluster in our institute using python scripts.
compile_cmd = 'ifort -openmp ran_numbers.f90 ' + fname \
+ ' ompscmf.f90 -o scmf.o'
subprocess.Popen(compile_cmd, shell=True)
Popen('qsub launcher',shell=True)
The problem is that , system is hanging at this point. Any obvious mistakes in the above script? All the files mentioned in the code are available in that directory ( I have cross checked that). qsub is a command used to submit jobs to our cluster. fname is the name of a file that I created in the process.
I have a script that I used to submit multiple jobs to our cluster using qsub. qsub typically takes job submissions in the form
qsub [qsub options] job
In my line of work, job is typically a bash (.sh) or python script (.py) that actually calls the programs or code to be run on each node. If I wanted to submit a job called "test_job.sh" with maximum walltime, I would do
qsub -l walltime=72:00:00 test_job.sh
This amounts to the following python code
from subprocess import call
qsub_call = "qsub -l walltime=72:00:00 %s"
call(qsub_call % "test_job.sh", shell=True)
Alternatively, what if you had a bash script that looked like
#!/bin/bash
filename="your_filename_here"
ifort -openmp ran_numbers.f90 $filename ompscmf.f90 -o scmf.o
then submitted this via qsub job.sh?
Edit: Honestly, the most optimal job queueing scheme varies from cluster to cluster. One simple way to simplify you job submissions scripts is to find out how many CPUs are available at each node. Some of the more recent queueing systems allow you to submit many single CPU jobs and they will submit these together on as few nodes as possible; however, some older clusters won't do that and submitting many individual jobs is frowned upon.
Say that each node in your cluster has 8 CPUs. You could write you script like
#!/bin/bash
#PBS -l nodes=1;ppn=8
for ((i=0; i<8; i++))
do
./myjob.sh filename_${i} &
done
wait
What this will do is submit 8 jobs on one node at once (& means do in background) and wait until all 8 jobs are finished. This may be optimal for clusters with many CPUs per node (for example, one cluster that I used has 48 CPUs per node).
Alternatively, if submitting many single core jobs is optimal and your submission code above isn't working, you could use python to generate bash scripts to pass to qsub.
#!/usr/bin/env python
import os
from subprocess import call
bash_lines = ['#!/bin/bash\n', '#PBS -l nodes=1;ppn=1\n']
bash_name = 'myjob_%i.sh'
job_call = 'ifort -openmp ran_numbers.f90 %s ompscmf.f90 -o scmf.o &\n'
qsub_call = 'qsub myjob_%i.sh'
filenames = [os.path.join(root, f) for root, _, files in os.walk(directory)
for f in files if f.endswith('.txt')]
for i, filename in enumerate(filenames):
with open(bash_name%i, 'w') as bash_file:
bash_file.writelines(bash_lines + [job_call%filename, 'wait\n'])
call(qsub_call%i, shell=True)
Did you get any errors. Because it seems you missed the "subprocess." at the second Popen.

Python - running a file listener when system boot till system switched off to do some automated actions, but not working

I have to read a file forever such as when system boot and till end. Based on the file events python script has to take actions. But its weired that my python script is not doing so.
Step 1: crontab and on boot i have this following
tail -f /var/tmp/event.log | python /var/tmp/event.py &
Step 2: other applications are now dumping lines in the file /var/tmp/event.log, for example
java -cp Java.jar Main.main | tee -a /var/tmp/event.log
or
echo "tst" >> /var/tmp/event.log
or
otherapps | tee -a /var/tmp/event.log
Step 3: python event.py is having a loop to listen the commands and execute but those execution is not happening
import sys, time, os
while True:
line = sys.stdin.readline()
if line:
if "runme.sh" in line:
os.system("/var/tmp/runme.sh")
if "killfirefox.sh" in line:
os.system("/var/tmp/killfirefox.sh")
if "shotemail.sh" in line:
os.system("/var/tmp/shotemail.sh")
if "scan.sh" in line:
os.system("/var/tmp/scan.sh")
if "screenshot.sh" in line:
os.system("/var/tmp/screenshot.sh")
else:
time.sleep(1)
What is that i am doing wrong? I have verified the event.log file is having correct command per line and even manually when i do echo "runme.sh" | python /var/tmp/event.py it also works but why not working while using boot time/crontab mode?
Probably because the tail command's output is being buffered. When you write to a pipe in bash, the output is not immediately sent to the other command - it is buffered and then output later. For an example, try running ls -l | more on a large directory. You'll see a noticeable delay before more outputs the listing.
There are a number of ways around this. You could use stdbuf (see here for more on it) like so:
stdbuf -i0 -e0 -o0 tail -f /var/tmp/event.log | python /var/tmp/event.py
Try this and see if it works better. A more elegant solution would be to use a temporary file that only gets written to when there is output to be read, which is then monitored by the python script, or a named pipe.

Parallel processing from a command queue on Linux (bash, python, ruby... whatever)

I have a list/queue of 200 commands that I need to run in a shell on a Linux server.
I only want to have a maximum of 10 processes running (from the queue) at once. Some processes will take a few seconds to complete, other processes will take much longer.
When a process finishes I want the next command to be "popped" from the queue and executed.
Does anyone have code to solve this problem?
Further elaboration:
There's 200 pieces of work that need to be done, in a queue of some sort. I want to have at most 10 pieces of work going on at once. When a thread finishes a piece of work it should ask the queue for the next piece of work. If there's no more work in the queue, the thread should die. When all the threads have died it means all the work has been done.
The actual problem I'm trying to solve is using imapsync to synchronize 200 mailboxes from an old mail server to a new mail server. Some users have large mailboxes and take a long time tto sync, others have very small mailboxes and sync quickly.
On the shell, xargs can be used to queue parallel command processing. For example, for having always 3 sleeps in parallel, sleeping for 1 second each, and executing 10 sleeps in total do
echo {1..10} | xargs -d ' ' -n1 -P3 sh -c 'sleep 1s' _
And it would sleep for 4 seconds in total. If you have a list of names, and want to pass the names to commands executed, again executing 3 commands in parallel, do
cat names | xargs -n1 -P3 process_name
Would execute the command process_name alice, process_name bob and so on.
I would imagine you could do this using make and the make -j xx command.
Perhaps a makefile like this
all : usera userb userc....
usera:
imapsync usera
userb:
imapsync userb
....
make -j 10 -f makefile
Parallel is made exatcly for this purpose.
cat userlist | parallel imapsync
One of the beauties of Parallel compared to other solutions is that it makes sure output is not mixed. Doing traceroute in Parallel works fine for example:
(echo foss.org.my; echo www.debian.org; echo www.freenetproject.org) | parallel traceroute
For this kind of job PPSS is written: Parallel processing shell script. Google for this name and you will find it, I won't linkspam.
GNU make (and perhaps other implementations as well) has the -j argument, which governs how many jobs it will run at once. When a job completes, make will start another one.
Well, if they are largely independent of each other, I'd think in terms of:
Initialize an array of jobs pending (queue, ...) - 200 entries
Initialize an array of jobs running - empty
while (jobs still pending and queue of jobs running still has space)
take a job off the pending queue
launch it in background
if (queue of jobs running is full)
wait for a job to finish
remove from jobs running queue
while (queue of jobs is not empty)
wait for job to finish
remove from jobs running queue
Note that the tail test in the main loop means that if the 'jobs running queue' has space when the while loop iterates - preventing premature termination of the loop. I think the logic is sound.
I can see how to do that in C fairly easily - it wouldn't be all that hard in Perl, either (and therefore not too hard in the other scripting languages - Python, Ruby, Tcl, etc). I'm not at all sure I'd want to do it in shell - the wait command in shell waits for all children to terminate, rather than for some child to terminate.
In python, you could try:
import Queue, os, threading
# synchronised queue
queue = Queue.Queue(0) # 0 means no maximum size
# do stuff to initialise queue with strings
# representing os commands
queue.put('sleep 10')
queue.put('echo Sleeping..')
# etc
# or use python to generate commands, e.g.
# for username in ['joe', 'bob', 'fred']:
# queue.put('imapsync %s' % username)
def go():
while True:
try:
# False here means no blocking: raise exception if queue empty
command = queue.get(False)
# Run command. python also has subprocess module which is more
# featureful but I am not very familiar with it.
# os.system is easy :-)
os.system(command)
except Queue.Empty:
return
for i in range(10): # change this to run more/fewer threads
threading.Thread(target=go).start()
Untested...
(of course, python itself is single-threaded. You should still get the benefit of multiple threads in terms of waiting for IO, though.)
If you are going to use Python, I recommend using Twisted for this.
Specifically Twisted Runner.
https://savannah.gnu.org/projects/parallel (gnu parallel)
and pssh might help.
Python's multiprocessing module would seem to fit your issue nicely. It's a high-level package that supports threading by process.
Simple function in zsh to parallelize jobs in not more than 4 subshells, using lock files in /tmp.
The only non trivial part are the glob flags in the first test:
#q: enable filename globbing in a test
[4]: returns the 4th result only
N: ignore error on empty result
It should be easy to convert it to posix, though it would be a bit more verbose.
Do not forget to escape any quotes in the jobs with \".
#!/bin/zsh
setopt extendedglob
para() {
lock=/tmp/para_$$_$((paracnt++))
# sleep as long as the 4th lock file exists
until [[ -z /tmp/para_$$_*(#q[4]N) ]] { sleep 0.1 }
# Launch the job in a subshell
( touch $lock ; eval $* ; rm $lock ) &
# Wait for subshell start and lock creation
until [[ -f $lock ]] { sleep 0.001 }
}
para "print A0; sleep 1; print Z0"
para "print A1; sleep 2; print Z1"
para "print A2; sleep 3; print Z2"
para "print A3; sleep 4; print Z3"
para "print A4; sleep 3; print Z4"
para "print A5; sleep 2; print Z5"
# wait for all subshells to terminate
wait
Can you elaborate what you mean by in parallel? It sounds like you need to implement some sort of locking in the queue so your entries are not selected twice, etc and the commands run only once.
Most queue systems cheat -- they just write a giant to-do list, then select e.g. ten items, work them, and select the next ten items. There's no parallelization.
If you provide some more details, I'm sure we can help you out.

Categories

Resources