Python difficulty running functions in parallel

Python difficulty running functions in parallel - python

I am currently Using python 2.6 and am attempting to run another python script multiple times with different input, and whatever attempts I do to run it in the background, it seems that the script waits for the process to complete, before moving on to the next line. I have tried using
subprocess.Popen(Some.Func(Args))
and
T1 = threading.Thread(Some.Func(Args))
T1.start()
I would like to be able to run through multiple calls to Some class without waiting on any particular one to finish.

You are not passing the arguments to your classes correctly. You want to use multiprocessing.Process or threading.Thread. Specify your target and args separately from each other. The following example demonstrates running ten processes in parallel followed by ten threads in parallel:
#! /usr/bin/env python3
import multiprocessing
import threading
def main():
for executor in multiprocessing.Process, threading.Thread:
engines = []
for _ in range(10):
runner = executor(target=for_loop, args=(0, 10000000, 1))
runner.start()
engines.append(runner)
for runner in engines:
runner.join()
def for_loop(start, stop, step):
accumulator = start
while accumulator < stop:
accumulator += step
if __name__ == '__main__':
main()

Related

Using multithreading - worker not defined when function present

I have a bit of a long script, so a minimum viable example may not be easily possible.
I'm trying to follow some previous SO posts about threading - I wish to execute an exe dozens of times and to use as many CPU cores as possible. Early days here.
First I'm defining a list with elements that are dynamically populated
from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadPool
work_items = []
Then I'm defining a function, at the same indent level as all my other working functions:
#To help with executing my exe in Parallel
def worker(tup):
subprocess.call(tup)
Then, finally, inside the function that will call this function:
#Execute jobs
start = time.time()
with ThreadPool(4) as pool:
work_results = pool.map(worker, work_items)
end = time.time()
print(end - start)
The line that is causing me grief is work_results = pool.map(worker, work_items). My linter in VSCode, and the python shell when I attempt to test both return that worker is not defined. My understanding is that the function should be in scope, and it is defined.
Is there something here that stands out as an issue as to why it would be reporting that worker is an undefined function?

How to run n processes simultaneously in Python

I am trying to execute n processes simultaneously. The example below works with 2 processes that are supplied externally.
At the moment it is all hard-coded for just these 2 processes but I would need to come up with the generic solution how to accomplish the same - i.e. run n processes at the same time.
My code is as follows:
import multiprocessing
'''
The first process: print 'aa'
The second Process: print 'BB'
'''
def TR1():
print 'aaaaaaaaa'
def TR2():
print 'BBBBBBBB'
if __name__ == '__main__':
process_1 = multiprocessing.Process(name='process_1', target=TR1)
process_2 = multiprocessing.Process(name='process_2', target=TR2)
process_1.start()
process_2.start()
Thanks for your suggestions!

You can either spawn processes in a loop, or use executor pool.
In real life, later one is often preferred approach, as you can limit pool size and have easy result gathering.
If you're using python 2, there's backport including ProcessPoolExecutor

Python multiprocessing - Is it possible to introduce a fixed time delay between individual processes?

I have searched and cannot find an answer to this question elsewhere. Hopefully I haven't missed something.
I am trying to use Python multiprocessing to essentially batch run some proprietary models in parallel. I have, say, 200 simulations, and I want to batch run them ~10-20 at a time. My problem is that the proprietary software crashes if two models happen to start at the same / similar time. I need to introduce a delay between processes spawned by multiprocessing so that each new model run waits a little bit before starting.
So far, my solution has been to introduced a random time delay at the start of the child process before it fires off the model run. However, this only reduces the probability of any two runs starting at the same time, and therefore I still run into problems when trying to process a large number of models. I therefore think that the time delay needs to be built into the multiprocessing part of the code but I haven't been able to find any documentation or examples of this.
Edit: I am using Python 2.7
This is my code so far:
from time import sleep
import numpy as np
import subprocess
import multiprocessing
def runmodels(arg):
sleep(np.random.rand(1,1)*120) # this is my interim solution to reduce the probability that any two runs start at the same time, but it isn't a guaranteed solution
subprocess.call(arg) # this line actually fires off the model run
if __name__ == '__main__':
arguments = [big list of runs in here
]
count = 12
pool = multiprocessing.Pool(processes = count)
r = pool.imap_unordered(runmodels, arguments)
pool.close()
pool.join()

multiprocessing.Pool() already limits number of processes running concurrently.
You could use a lock, to separate the starting time of the processes (not tested):
import threading
import multiprocessing
def init(lock):
global starting
starting = lock
def run_model(arg):
starting.acquire() # no other process can get it until it is released
threading.Timer(1, starting.release).start() # release in a second
# ... start your simulation here
if __name__=="__main__":
arguments = ...
pool = Pool(processes=12,
initializer=init, initargs=[multiprocessing.Lock()])
for _ in pool.imap_unordered(run_model, arguments):
pass

One way to do this with thread and semaphore :
from time import sleep
import subprocess
import threading
def runmodels(arg):
subprocess.call(arg)
sGlobal.release() # release for next launch
if __name__ == '__main__':
threads = []
global sGlobal
sGlobal = threading.Semaphore(12) #Semaphore for max 12 Thread
arguments = [big list of runs in here
]
for arg in arguments :
sGlobal.acquire() # Block if more than 12 thread
t = threading.Thread(target=runmodels, args=(arg,))
threads.append(t)
t.start()
sleep(1)
for t in threads :
t.join()

The answer suggested by jfs caused problems for me as a result of starting a new thread with threading.Timer. If the worker just so happens to finish before the timer does, the timer is killed and the lock is never released.
I propose an alternative route, in which each successive worker will wait until enough time has passed since the start of the previous one. This seems to have the same desired effect, but without having to rely on another child process.
import multiprocessing as mp
import time
def init(shared_val):
global start_time
start_time = shared_val
def run_model(arg):
with start_time.get_lock():
wait_time = max(0, start_time.value - time.time())
time.sleep(wait_time)
start_time.value = time.time() + 1.0 # Specify interval here
# ... start your simulation here
if __name__=="__main__":
arguments = ...
pool = mp.Pool(processes=12,
initializer=init, initargs=[mp.Value('d')])
for _ in pool.imap_unordered(run_model, arguments):
pass

How to run parallel programs in python

I have a python script to run a few external commands using the os.subprocess module. But one of these steps takes a huge time and so I would like to run it separately. I need to launch them, check they are finished and then execute the next command which is not parallel.
My code is something like this:
nproc = 24
for i in xrange(nproc):
#Run program in parallel
#Combine files generated by the parallel step
for i in xrange(nproc):
handle = open('Niben_%s_structures' % (zfile_name), 'w')
for i in xrange(nproc):
for zline in open('Niben_%s_file%d_structures' % (zfile_name,i)):handle.write(zline)
handle.close()
#Run next step
cmd = 'bowtie-build -f Niben_%s_precursors.fa bowtie-index/Niben_%s_precursors' % (zfile_name,zfile_name)

For your example, you just want to shell out in parallel - you don't need threads for that.
Use the Popen constructor in the subprocess module: http://docs.python.org/library/subprocess.htm
Collect the Popen instances for each process you spawned and then wait() for them to finish:
procs = []
for i in xrange(nproc):
procs.append(subprocess.Popen(ARGS_GO_HERE)) #Run program in parallel
for p in procs:
p.wait()
You can get away with this (as opposed to using the multiprocessing or threading modules), since you aren't really interested in having these interoperate - you just want the os to run them in parallel and be sure they are all finished when you go to combine the results...

Running things in parallel can also be implemented using multiple processes in Python. I had written a blog post on this topic a while ago, you can find it here
http://multicodecjukebox.blogspot.de/2010/11/parallelizing-multiprocessing-commands.html
Basically, the idea is to use "worker processes" which independently retrieve jobs from a queue and then complete these jobs.
Works quite well in my experience.

You can do it using threads. This is very short and (not tested) example with very ugly if-else on what you are actually doing in the thread, but you can write you own worker classes..
import threading
class Worker(threading.Thread):
def __init__(self, i):
self._i = i
super(threading.Thread,self).__init__()
def run(self):
if self._i == 1:
self.result = do_this()
elif self._i == 2:
self.result = do_that()
threads = []
nproc = 24
for i in xrange(nproc):
#Run program in parallel
w = Worker(i)
threads.append(w)
w.start()
w.join()
# ...now all threads are done
#Combine files generated by the parallel step
for i in xrange(nproc):
handle = open('Niben_%s_structures' % (zfile_name), 'w')
...etc...

basic multiprocessing with python

I have found information on multiprocessing and multithreading in python but I don't understand the basic concepts and all the examples that I found are more difficult than what I'm trying to do.
I have X independent programs that I need to run. I want to launch the first Y programs (where Y is the number of cores of my computer and X>>Y). As soon as one of the independent programs is done, I want the next program to run in the next available core. I thought that this would be straightforward, but I keep getting stuck on it. Any help in solving this problem would be much appreciated.
Edit: Thanks a lot for your answers. I also found another solution using the joblib module that I wanted to share. Suppose that you have a script called 'program.py' that you want to run with different combination of the input parameters (a0,b0,c0) and you want to use all your cores. This is a solution.
import os
from joblib import Parallel, delayed
a0 = arange(0.1,1.1,0.1)
b0 = arange(-1.5,-0.4,0.1)
c0 = arange(1.,5.,0.1)
params = []
for i in range(len(a0)):
for j in range(len(b0)):
for k in range(len(c0)):
params.append((a0[i],b0[j],c0[k]))
def func(parameters):
s = 'python program.py %g %g %g' % parameters[0],parameters[1],parameters[2])
command = os.system(s)
return command
output = Parallel(n_jobs=-1,verbose=1000)(delayed(func)(i) for i in params)

You want to use multiprocessing.Pool, which represents a "pool" of workers (default one per core, though you can specify another number) that do your jobs. You then submit jobs to the pool, and the workers handle them as they become available. The easiest function to use is Pool.map, which runs a given function for each of the arguments in the passed sequence, and returns the result for each argument. If you don't need return values, you could also use apply_async in a loop.
def do_work(arg):
pass # do whatever you actually want to do
def run_battery(args):
# args should be like [arg1, arg2, ...]
pool = multiprocessing.Pool()
ret_vals = pool.map(do_work, arg_tuples)
pool.close()
pool.join()
return ret_vals
If you're trying to call external programs and not just Python functions, use subprocess. For example, this will call cmd_name with the list of arguments passed, raise an exception if the return code isn't 0, and return the output:
def do_work(subproc_args):
return subprocess.check_output(['cmd_name'] + list(subproc_args))

Hi i'm using the object QThread from pyqt
From what i understood, your thread when he is running can only use his own variable and proc, he cannot change your main object variables
So before you run it be sur to define all the qthread variables you will need
like this for example:
class worker(QThread)
def define(self, phase):
print 'define'
self.phase=phase
self.start()#will run your thread
def continueJob(self):
self.start()
def run(self):
self.launchProgramme(self.phase)
self.phase+=1
def launchProgramme(self):
print self.phase
i'm not well aware of how work the basic python thread but in pyqt your thread launch a signal
to your main object like this:
class mainObject(QtGui.QMainWindow)
def __init__(self):
super(mcMayaClient).__init__()
self.numberProgramme=4
self.thread = Worker()
#create
self.connect(self.thread , QtCore.SIGNAL("finished()"), self.threadStoped)
self.connect(self.thread , QtCore.SIGNAL("terminated()"), self.threadStopped)
connected like this, when the thread.run stop, it will launch your threadStopped proc in your main object where u can get the value of your thread Variables
def threadStopped(self):
value=self.worker.phase
if value<self.numberProgramme:
self.worker.continueJob()
after that you just have to lauch another thread or not depending of the value you get
This is for pyqt threading of course, in python basic thread, the way to execute the def threadStopped could be different.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python difficulty running functions in parallel - python

Related

Using multithreading - worker not defined when function present

How to run n processes simultaneously in Python

Python multiprocessing - Is it possible to introduce a fixed time delay between individual processes?

How to run parallel programs in python

basic multiprocessing with python

Categories

Resources