Parallel SGE queue does not execute python code - python

I am currently using a cluster which uses the SGE. There, I am submitting a .sh script which calls a python script (which is multithreaded by using multiprocessing.pool), to a parallel queue by calling qsub run.sh. The python script itself is printing some kind of progress via print(...). This is then appearing in the output file which is created by the SGE. Now there is a huge problem: When I execute the script manually everything works like a charm, but when I use the parallel queue at some (random) iteration the pool worker seems to stop working, as no further progress can be seen in the output file. Futhermore, the CPU usage suddenly drops to 0% and all threads of the script are just idling.
What can I do to solve this problem? Or how can I even debug it? As there are no error messages in the output file, I am really confused.
Edit: Here are some parts of the shell script which is added to the q and the necessary python files.
main.sh:
#!/bin/bash
# Use python as shell
#$ -S /bin/bash
# Preserve environment variables
#$ -V
# Execute from current working directory
#$ -cwd
# Merge standard output and standard error into one file
#$ -j yes
# Standard name of the job (if none is given on the command line):
#$ -N vh_esn_gs
# Path for the output files
#$ -o /home/<username>/q-out/
# Limit memory usage
#$ -hard -l h_vmem=62G
# array range
#$ -t 1-2
# parallel
#$ -pe <qname> 16
#$ -q <qname>
python mainscript.py
mainscript.py:
#read parameters etc [...]
def mainFunction():
worker = ClassWorker(...)
worker.startparallel()
if __name__== '__main__':
mainFunction()
whereby the ClassWorker is defined like this:
class ClassWorker:
def _get_score(data):
params, fixed_params, trainingInput, trainingOutput, testingDataSequence, esnType = data
[... (the calculation is perfomed)]
dat = (test_mse, training_acc, params)
ClassWorker._get_score.q.put(dat)
return dat
def _get_score_init(q):
ClassWorker._get_score.q = q
def startparallel():
queue = Queue()
pool = Pool(processes=n_jobs, initializer=ClassWorker._get_score_init, initargs=[queue,] )
[...(setup jobs)]
[start async thread to watch for incoming results in the q to update the progress]
results = pool.map(GridSearchP._get_score, jobs)
pool.close()
Maybe this helps to spot the problem. I did not include the real calculation part as this has not caused any trouble on the cluster so far, so this shoul be safe.

Related

linux python application pid exisisting check

I have strange problem with auto run my python application. As everybody know to run this kind of app I need run command:
python app_script.py
Now I try to run this app by cronetab using one simple script to ensure that this app isn't running. If answer is no, script run application.
#!/bin/bash
pidof appstart.py >/dev/null
if [[ $? -ne 0 ]] ; then
python /path_to_my_app/appstart.py &
fi
Bad side of this approach is that script during checking pid, checks only first word from command of ps aux table and in this example it always will be python and skip script name (appstart). So when i run another app based on python language the script will failed... Maybe somebody know how to check this out in a proper way?
This might be a question better suited for Unix & Linux Stack Exchange.
However, it's common to use pgrep instead of pidof for applications like yours:
$ pidof appstart.py # nope
$ pidof python # works, but it can be different python
16795
$ pgrep appstart.py # nope, it would match just 'python', too
$ pgrep -f appstart.py # -f is for 'full', it searches the whole commandline (so it finds appstart.py)
16795
From man pgrep: The pattern is normally only matched against the process name. When -f is set, the full command line is used.
Maybe you should better check for pid-file created in your application?
This will help you track even different instances of same script if needed. Something just like this:
#!/usr/bin/env python3
import os
import sys
import atexit
PID_file = "/tmp/app_script.pid"
PID = str(os.getpid())
if os.path.isfile(PID_file):
sys.exit('{} already exists!'.format(PID_file))
open(PID_file, 'w').write(PID)
def cleanup():
os.remove(PID_file)
atexit.register(cleanup)
# DO YOUR STUFF HERE
After that you'll be able to check if file exists, and if it exists you'll be able to retrieve PID of your script.
[ -f /tmp/app_script.pid ] && ps up $(cat /tmp/app_script.pid) >/dev/null && echo "Started" || echo "Not Started"
you could also do the whole thing in python without the bash-script around it by creating a pidfile somewhere writeable.
import os
import sys
pidpath = os.path.abspath('/tmp/myapp.pid')
def myfunc():
"""
Your logic goes here
"""
return
if __name__ == '__main__':
# check for existing pidfile and fail if true
if os.path.exists(pidpath):
print('Script already running.')
sys.exit(1)
else:
# otherwise write current pid to file
with open(pidpath,'w') as _f:
_f.write(str(os.getpid()))
try:
# call your function
myfunc()
except Exception, e:
# clean up after yourself in case something breaks
os.remove(pidpath)
print('Exception: {}'.format(e))
sys.exit(1)
finally:
# also clean up after yourself in case everything's fine...
os.remove(pidpath)

How to execute multiple bash commands in parallel in python

So, I have a code which takes in input and starts a spark job in cluster.. So, something like
spark-submit driver.py -i input_path
Now, I have list of paths and I want to execute all these simulatenously..
Here is what I tried
base_command = 'spark-submit driver.py -i %s'
for path in paths:
command = base_command%path
subprocess.Popen(command, shell=True)
My hope was, all of the shell commands would be executed simultaneously but instead, I am noticing that it executes one command at a time..
How do i execute all the bash commands simultaneously.
Thanks
This is where pool comes in, it is designed for just this case. It maps many inputs to many threads automatically. Here is a good resource on how to use it.
from multiprocessing import Pool
def run_command(path):
command = "spark-submit driver.py -i {}".format(path)
subprocess.Popen(command, shell=True)
pool = Pool()
pool.map(run_command, paths)
It will create a thread for every item in paths and, run them all at the same time for the given input

Running external commands partly in parallel from python (or bash)

I am running a python script which creates a list of commands which should be executed by a compiled program (proprietary).
The program kan split some of the calculations to run independently and the data will then be collected afterwards.
I would like to run these calculations in parallel as each are a very time consuming single threaded task and I have 16 cores available.
I am using subprocess to execute the commands (in Class environment):
def run_local(self):
p = Popen(["someExecutable"], stdout=PIPE, stdin=PIPE)
p.stdin.write(self.exec_string)
p.stdin.flush()
while(p.poll() is not none):
line = p.stdout.readline()
self.log(line)
Where self.exec_string is a string of all the commands.
This string an be split into: an initial part, the part i want parallelised and a finishing part.
How should i go about this?
Also it seems the executable will "hang" (waiting for a command, eg. "exit" which will release the memory) if a naive copy-paste of the current method is used for each part.
Bonus: The executable also has the option to run a bash script of commands, if it is easier/possible to parallelise bash?
For bash, it could be very simple. Assuming your file looks like this:
## init part##
ls
cd ..
ls
cat some_file.txt
## parallel ##
heavycalc &
heavycalc &
heavycalc &
## finish ##
wait
cat results.txt
With & behind the command you tell bash to run this command in a background-thread. wait will then wait for all background-threads to finish, so you can be sure, all calculations are done.
I've assumed your input txt-file are plain bash-commands.
Using GNU Parallel:
## init
cd foo
cp bar baz
## parallel ##
parallel heavycalc ::: file1 file2 file3 > results.txt
## finish ##
cat results.txt
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Submitting jobs using python

I am trying to submit a job in a cluster in our institute using python scripts.
compile_cmd = 'ifort -openmp ran_numbers.f90 ' + fname \
+ ' ompscmf.f90 -o scmf.o'
subprocess.Popen(compile_cmd, shell=True)
Popen('qsub launcher',shell=True)
The problem is that , system is hanging at this point. Any obvious mistakes in the above script? All the files mentioned in the code are available in that directory ( I have cross checked that). qsub is a command used to submit jobs to our cluster. fname is the name of a file that I created in the process.
I have a script that I used to submit multiple jobs to our cluster using qsub. qsub typically takes job submissions in the form
qsub [qsub options] job
In my line of work, job is typically a bash (.sh) or python script (.py) that actually calls the programs or code to be run on each node. If I wanted to submit a job called "test_job.sh" with maximum walltime, I would do
qsub -l walltime=72:00:00 test_job.sh
This amounts to the following python code
from subprocess import call
qsub_call = "qsub -l walltime=72:00:00 %s"
call(qsub_call % "test_job.sh", shell=True)
Alternatively, what if you had a bash script that looked like
#!/bin/bash
filename="your_filename_here"
ifort -openmp ran_numbers.f90 $filename ompscmf.f90 -o scmf.o
then submitted this via qsub job.sh?
Edit: Honestly, the most optimal job queueing scheme varies from cluster to cluster. One simple way to simplify you job submissions scripts is to find out how many CPUs are available at each node. Some of the more recent queueing systems allow you to submit many single CPU jobs and they will submit these together on as few nodes as possible; however, some older clusters won't do that and submitting many individual jobs is frowned upon.
Say that each node in your cluster has 8 CPUs. You could write you script like
#!/bin/bash
#PBS -l nodes=1;ppn=8
for ((i=0; i<8; i++))
do
./myjob.sh filename_${i} &
done
wait
What this will do is submit 8 jobs on one node at once (& means do in background) and wait until all 8 jobs are finished. This may be optimal for clusters with many CPUs per node (for example, one cluster that I used has 48 CPUs per node).
Alternatively, if submitting many single core jobs is optimal and your submission code above isn't working, you could use python to generate bash scripts to pass to qsub.
#!/usr/bin/env python
import os
from subprocess import call
bash_lines = ['#!/bin/bash\n', '#PBS -l nodes=1;ppn=1\n']
bash_name = 'myjob_%i.sh'
job_call = 'ifort -openmp ran_numbers.f90 %s ompscmf.f90 -o scmf.o &\n'
qsub_call = 'qsub myjob_%i.sh'
filenames = [os.path.join(root, f) for root, _, files in os.walk(directory)
for f in files if f.endswith('.txt')]
for i, filename in enumerate(filenames):
with open(bash_name%i, 'w') as bash_file:
bash_file.writelines(bash_lines + [job_call%filename, 'wait\n'])
call(qsub_call%i, shell=True)
Did you get any errors. Because it seems you missed the "subprocess." at the second Popen.

Python - running a file listener when system boot till system switched off to do some automated actions, but not working

I have to read a file forever such as when system boot and till end. Based on the file events python script has to take actions. But its weired that my python script is not doing so.
Step 1: crontab and on boot i have this following
tail -f /var/tmp/event.log | python /var/tmp/event.py &
Step 2: other applications are now dumping lines in the file /var/tmp/event.log, for example
java -cp Java.jar Main.main | tee -a /var/tmp/event.log
or
echo "tst" >> /var/tmp/event.log
or
otherapps | tee -a /var/tmp/event.log
Step 3: python event.py is having a loop to listen the commands and execute but those execution is not happening
import sys, time, os
while True:
line = sys.stdin.readline()
if line:
if "runme.sh" in line:
os.system("/var/tmp/runme.sh")
if "killfirefox.sh" in line:
os.system("/var/tmp/killfirefox.sh")
if "shotemail.sh" in line:
os.system("/var/tmp/shotemail.sh")
if "scan.sh" in line:
os.system("/var/tmp/scan.sh")
if "screenshot.sh" in line:
os.system("/var/tmp/screenshot.sh")
else:
time.sleep(1)
What is that i am doing wrong? I have verified the event.log file is having correct command per line and even manually when i do echo "runme.sh" | python /var/tmp/event.py it also works but why not working while using boot time/crontab mode?
Probably because the tail command's output is being buffered. When you write to a pipe in bash, the output is not immediately sent to the other command - it is buffered and then output later. For an example, try running ls -l | more on a large directory. You'll see a noticeable delay before more outputs the listing.
There are a number of ways around this. You could use stdbuf (see here for more on it) like so:
stdbuf -i0 -e0 -o0 tail -f /var/tmp/event.log | python /var/tmp/event.py
Try this and see if it works better. A more elegant solution would be to use a temporary file that only gets written to when there is output to be read, which is then monitored by the python script, or a named pipe.

Categories

Resources