basic multiprocessing with python - python

I have found information on multiprocessing and multithreading in python but I don't understand the basic concepts and all the examples that I found are more difficult than what I'm trying to do.
I have X independent programs that I need to run. I want to launch the first Y programs (where Y is the number of cores of my computer and X>>Y). As soon as one of the independent programs is done, I want the next program to run in the next available core. I thought that this would be straightforward, but I keep getting stuck on it. Any help in solving this problem would be much appreciated.
Edit: Thanks a lot for your answers. I also found another solution using the joblib module that I wanted to share. Suppose that you have a script called 'program.py' that you want to run with different combination of the input parameters (a0,b0,c0) and you want to use all your cores. This is a solution.
import os
from joblib import Parallel, delayed
a0 = arange(0.1,1.1,0.1)
b0 = arange(-1.5,-0.4,0.1)
c0 = arange(1.,5.,0.1)
params = []
for i in range(len(a0)):
for j in range(len(b0)):
for k in range(len(c0)):
params.append((a0[i],b0[j],c0[k]))
def func(parameters):
s = 'python program.py %g %g %g' % parameters[0],parameters[1],parameters[2])
command = os.system(s)
return command
output = Parallel(n_jobs=-1,verbose=1000)(delayed(func)(i) for i in params)

You want to use multiprocessing.Pool, which represents a "pool" of workers (default one per core, though you can specify another number) that do your jobs. You then submit jobs to the pool, and the workers handle them as they become available. The easiest function to use is Pool.map, which runs a given function for each of the arguments in the passed sequence, and returns the result for each argument. If you don't need return values, you could also use apply_async in a loop.
def do_work(arg):
pass # do whatever you actually want to do
def run_battery(args):
# args should be like [arg1, arg2, ...]
pool = multiprocessing.Pool()
ret_vals = pool.map(do_work, arg_tuples)
pool.close()
pool.join()
return ret_vals
If you're trying to call external programs and not just Python functions, use subprocess. For example, this will call cmd_name with the list of arguments passed, raise an exception if the return code isn't 0, and return the output:
def do_work(subproc_args):
return subprocess.check_output(['cmd_name'] + list(subproc_args))

Hi i'm using the object QThread from pyqt
From what i understood, your thread when he is running can only use his own variable and proc, he cannot change your main object variables
So before you run it be sur to define all the qthread variables you will need
like this for example:
class worker(QThread)
def define(self, phase):
print 'define'
self.phase=phase
self.start()#will run your thread
def continueJob(self):
self.start()
def run(self):
self.launchProgramme(self.phase)
self.phase+=1
def launchProgramme(self):
print self.phase
i'm not well aware of how work the basic python thread but in pyqt your thread launch a signal
to your main object like this:
class mainObject(QtGui.QMainWindow)
def __init__(self):
super(mcMayaClient).__init__()
self.numberProgramme=4
self.thread = Worker()
#create
self.connect(self.thread , QtCore.SIGNAL("finished()"), self.threadStoped)
self.connect(self.thread , QtCore.SIGNAL("terminated()"), self.threadStopped)
connected like this, when the thread.run stop, it will launch your threadStopped proc in your main object where u can get the value of your thread Variables
def threadStopped(self):
value=self.worker.phase
if value<self.numberProgramme:
self.worker.continueJob()
after that you just have to lauch another thread or not depending of the value you get
This is for pyqt threading of course, in python basic thread, the way to execute the def threadStopped could be different.

Related

multiprocessing in python using pool.map_async

Hi I don't feel like I have quite understood multiprocessing in python correctly.
I want to run a function called 'run_worker' (which is simply code that runs and manages a subprocess) 20 times in parallel and wait for all the functions to complete. Each run_worker should run on a separate core/thread. I don' mind what order the processes complete hence i used async and i dont have a return value so i used map
I thought that I should use:
if __name__ == "__main__":
num_workers = 20
param_map = []
for i in range(num_workers):
param_map += [experiment_id]
pool = mp.Pool(processes= num_workers)
pool.map_async(run_worker, param_map)
pool.close()
pool.join()
However this code exits straight away and doesn't appear to execute run_worker properly. Also do I really have to create a param_map of the same experiment_id to pass to the worker because this seems like a hack to get the number of run_workers created. Ideally i would like to run a function with no parameters and no return value over multiple cores.
Note I am using windows 2019 server in AWS.
edit added run_worker which calls a subprocess which write to file:
def run_worker(experiment_id):
hostname = socket.gethostname()
experiment = conn.experiments(experiment_id).fetch()
while experiment.progress.observation_count < experiment.observation_budget:
suggestion = conn.experiments(experiment.id).suggestions().create()
value = evaluate_model(suggestion.assignments)
conn.experiments(experiment_id).observations().create(suggestion=suggestion.id,value=value,metadata=dict(hostname=hostname),)
# Update the experiment object
experiment = conn.experiments(experiment_id).fetch()
It seems that for this simple purpose you can better be using pool.map instead of pool.map_async. They both run in parallel, however pool.map is blocking until all operations are finished (see also this question). pool.map_async is especially meant for situations like this:
result = map_async(func, iterable)
while not result.ready():
// do some work while map_async is running
pass
// blocking call to get the result
out = result.get()
Regarding your question about the parameters, the fundamental idea of a map operation is to map the values of one list/array/iterable to a new list of values of the same size. As far as I can see in the docs, multiprocessing does not provide any method to run multiple functions without parameters.
If you would also share your run_worker function, that might help to get better answers to your question. That might also clear up why you would run a function without any arguments and return values using a map operation in the first place.

Running multiple python scripts from a single script, and communicating back and forth between them?

I have a script that I wrote that I am able to pass arguments to, and I want launch multiple simultaneous iterations (maybe 100+) with unique arguments. My plan was to write another python script which then launch these subscripts/processes, however to be effective, I need the that script to be able to monitor the subscripts for any errors.
Is there any straightforward way to do this, or a library that offers this functionality? I've been searching for a while and am not having good luck finding anything. Creating subprocesses and multiple threads seems straight forward enough but I can't really find any guides or tutorials on how to then communicate with those threads/subprocesses.
A better way to do this would be to make use of threads. If you made the script you want to call into a function in this larger script, you could have your main function call this script as many times as you want and have the threads report back with information as needed. You can read a little bit about how threads work here.
I suggest to use threading.Thread or multiprocessing.Process despite of requirements.
Simple way to communicate between Threads / Processes is to use Queue. Multiprocessing module provides some other ways to communicate between processes (Queue, Event, Manager, ...)
You can see some elementary communication in the example:
import threading
from Queue import Queue
import random
import time
class Worker(threading.Thread):
def __init__(self, name, queue_error):
threading.Thread.__init__(self)
self.name = name
self.queue_error = queue_error
def run(self):
time.sleep(random.randrange(1, 10))
# Do some processing ...
# Report errors
self.queue_error.put((self.name, 'Error state'))
class Launcher(object):
def __init__(self):
self.queue_error = Queue()
def main_loop(self):
# Start threads
for i in range(10):
w = Worker(i, self.queue_error)
w.start()
# Check for errors
while True:
while not self.queue_error.empty():
error_data = self.queue_error.get()
print 'Worker #%s reported error: %s' % (error_data[0], error_data[1])
time.sleep(0.1)
if __name__ == '__main__':
l = Launcher()
l.main_loop()
Like someone else said, you have to use multiple processes for true parallelism instead of threads because the GIL limitation prevents threads from running concurrently.
If you want to use the standard multiprocessing library (which is based on launching multiple processes), I suggest using a pool of workers. If I understood correctly, you want to launch 100+ parallel instances. Launching 100+ processes on one host will generate too much overhead. Instead, create a pool of P workers where P is for example the number of cores in your machine and submit the 100+ jobs to the pool. This is simple to do and there are many examples on the web. Also, when you submit jobs to the pool, you can provide a callback function to receive errors. This may be sufficient for your needs (there are examples here).
The Pool in multiprocessing however can't distribute work across multiple hosts (e.g. cluster of machines) last time I looked. So, if you need to do this, or if you need a more flexible communication scheme, like being able to send updates to the controlling process while the workers are running, my suggestion is to use charm4py (note that I am a charm4py developer so this is where I have experience).
With charm4py you could create N workers which are distributed among P processes by the runtime (works across multiple hosts), and the workers can communicate with the controller simply by doing remote method invocation. Here is a small example:
from charm4py import charm, Chare, Group, Array, ArrayMap, Reducer, threaded
import time
WORKER_ITERATIONS = 100
class Worker(Chare):
def __init__(self, controller):
self.controller = controller
#threaded
def work(self, x, done_future):
result = -1
try:
for i in range(WORKER_ITERATIONS):
if i % 20 == 0:
# send status update to controller
self.controller.progressUpdate(self.thisIndex, i, ret=True).get()
if i == 5 and self.thisIndex[0] % 2 == 0:
# trigger NameError on even-numbered workers
test[3] = 3
time.sleep(0.01)
result = x**2
except Exception as e:
# send error to controller
self.controller.collectError(self.thisIndex, e)
# send result to controller
self.contribute(result, Reducer.gather, done_future)
# This custom map is used to prevent workers from being created on process 0
# (where the controller is). Not strictly needed, but allows more timely
# controller output
class WorkerMap(ArrayMap):
def procNum(self, index):
return (index[0] % (charm.numPes() - 1)) + 1
class Controller(Chare):
def __init__(self, args):
self.startTime = time.time()
done_future = charm.createFuture()
# create 12 workers, which are distributed by charm4py among processes
workers = Array(Worker, 12, args=[self.thisProxy], map=Group(WorkerMap))
# start work
for i in range(12):
workers[i].work(i, done_future)
print('Results are', done_future.get()) # wait for result
exit()
def progressUpdate(self, worker_id, current_step):
print(round(time.time() - self.startTime, 3), ': Worker', worker_id,
'progress', current_step * 100 / WORKER_ITERATIONS, '%')
# the controller can return a value here and the worker would receive it
def collectError(self, worker_id, error):
print(round(time.time() - self.startTime, 3), ': Got error', error,
'from worker', worker_id)
charm.start(Controller)
In this example, the Controller will print status updates and errors as they happen. It
will print final results from all workers when they are all done. The result for workers
that have failed will be -1.
The number of processes P is given at launch. The runtime will distribute the N workers among the available processes. This happens when the workers are created and there is no dynamic load balancing in this particular example.
Also, note that in the charm4py model remote method invocation is asynchronous and returns a future which the caller can block on, but only the calling thread blocks (not the whole process).
Hope this helps.

How to use multiprocess in python on a class object

I am fairly new to Python, and my experience is specific to its use in Powerflow modelling through the API provided in Siemens PSS/e. I have a script that I have been using for several years that runs some simulation on a large data set.
In order to get to finish quickly, I usually split the inputs up into multiple parts, then run multiple instances of the script in IDLE. Ive recently added a GUI for the inputs, and have refined the code to be more object oriented, creating a class that the GUI passes the inputs to but then works as the original script did.
My question is how do I go about splitting the process from within the program itself rather than making multiple copies? I have read a bit about the mutliprocess module but I am not sure how to apply it to my situation. Essentially I want to be able to instantiate N number of the same object, each running in parallel.
The class I have now (called Bot) is passed a set of arguments and creates a csv output while it runs until it finishes. I have a separate block of code that puts the pieces together at the end but for now I just need to understand the best approach to kicking multiple Bot objects off once I hit run in my GUI. Ther are inputs in the GUI to specify the number of N segments to be used.
I apologize ahead of time if my question is rather vague. Thanks for any information at all as Im sort of stuck and dont know where to look for better answers.
Create a list of configurations:
configurations = [...]
Create a function which takes the relevant configuration, and makes use of your Bot:
def function(configuration):
bot = Bot(configuration)
bot.create_csv()
Create a Pool of workers with however many CPUs you want to use:
from multiprocessing import Pool
pool = Pool(3)
Call the function multiple times which each configuration in your list of configurations.
pool.map(function, configurations)
For example:
from multiprocessing import Pool
import os
class Bot:
def __init__(self, inputs):
self.inputs = inputs
def create_csv(self):
pid = os.getpid()
print('TODO: create csv in process {} using {}'
.format(pid, self.inputs))
def use_bot(inputs):
bot = Bot(inputs)
bot.create_csv()
def main():
configurations = [
['input1_1.txt', 'input1_2.txt'],
['input2_1.txt', 'input2_2.txt'],
['input3_1.txt', 'input3_2.txt']]
pool = Pool(2)
pool.map(use_bot, configurations)
if __name__ == '__main__':
main()
Output:
TODO: create csv in process 10964 using ['input2_1.txt', 'input2_2.txt']
TODO: create csv in process 8616 using ['input1_1.txt', 'input1_2.txt']
TODO: create csv in process 8616 using ['input3_1.txt', 'input3_2.txt']
If you'd like to make life a little less complicated, you can use multiprocess instead of multiprocessing, as there is better support for classes and also for working in the interpreter. You can see below, we can now work directly with a method on a class instance, which is not possible with multiprocessing.
>>> from multiprocess import Pool
>>> import os
>>>
>>> class Bot(object):
... def __init__(self, x):
... self.x = x
... def doit(self, y):
... pid = os.getpid()
... return (pid, self.x + y)
...
>>> p = Pool()
>>> b = Bot(5)
>>> results = p.imap(b.doit, range(4))
>>> print dict(results)
{46552: 7, 46553: 8, 46550: 5, 46551: 6}
>>> p.close()
>>> p.join()
Above, I'm using imap, to get an iterator on the results, which I'll just dump into a dict. Note that you should close your pools after you are done, to clean up. On Windows, you may also want to look at freeze_support, for cases where the code otherwise fails to run.
>>> import multiprocess as mp
>>> mp.freeze_support

Multiprocesses python with shared memory

I have an object that connects to a websocket remote server. I need to do a parallel process at the same time. However, I don't want to create a new connection to the server. Since threads are the easier way to do this, this is what I have been using so far. However, I have been getting a huge latency because of GIL. Can I achieve the same thing as threads but with multiprocesses in parallel?
This is the code that I have:
class WebSocketApp(object):
def on_open(self):
# Create another thread to make sure the commands are always been read
print "Creating thread..."
try:
thread.start_new_thread( self.read_commands,() )
except:
print "Error: Unable to start thread"
Is there an equivalent way to do this with multiprocesses?
Thanks!
The direct equivalent is
import multiprocessing
class WebSocketApp(object):
def on_open(self):
# Create another process to make sure the commands are always been read
print "Creating process..."
try:
multiprocessing.Process(target=self.read_commands,).start()
except:
print "Error: Unable to start process"
However, this doesn't address the "shared memory" aspect, which has to be handled a little differently than it is with threads, where you can just use global variables. You haven't really specified what objects you need to share between processes, so it's hard to say exactly what approach you should take. The multiprocessing documentation does cover ways to deal with shared state, however. Do note that in general it's better to avoid shared state if possible, and just explicitly pass state between the processes, either as an argument to the Process constructor or via a something like a Queue.
You sure can, use something along the lines of:
from multiprocessing import Process
class WebSocketApp(object):
def on_open(self):
# Create another thread to make sure the commands are always been read
print "Creating thread..."
try:
p = Process(target = WebSocketApp.read_commands, args = (self, )) # Add other arguments to this tuple
p.start()
except:
print "Error: Unable to start thread"
It is important to note, however, that as soon as the object is sent to the other process the two objects self and self in the different threads diverge and represent different objects. If you wish to communicate you will need to use something like the included Queue or Pipe in the multiprocessing module.
You may need to keep a reference of all the processes (p in this case) in your main thread in order to be able to communicate that your program is terminating (As a still-running child process will appear to hang the parent when it dies), but that depends on the nature of your program.
If you wish to keep the object the same, you can do one of a few things:
Make all of your object properties either single values or arrays and then do something similar to this:
from multiprocessing import Process, Value, Array
class WebSocketApp(object):
def __init__(self):
self.my_value = Value('d', 0.3)
self.my_array = Array('i', [4 10 4])
# -- Snip --
And then these values should work as shared memory. The types are very restrictive though (You must specify their types)
A different answer is to use a manager:
from multiprocessing import Process, Manager
class WebSocketApp(object):
def __init__(self):
self.my_manager = Manager()
self.my_list = self.my_manager.list()
self.my_dict = self.my_manager.dict()
# -- Snip --
And then self.my_list and self.my_dict act as a shared-memory list and dictionary respectively.
However, the types for both of these approaches can be restrictive so you may have to roll your own technique with a Queue and a Semaphore. But it depends what you're doing.
Check out the multiprocessing documentation for more information.

simplifying threading in python

I am looking for a way to ease my threaded code.
There are a lot of places in my code where I do something like:
for arg in array:
t=Thread(lambda:myFunction(arg))
t.start()
i.e running the same function, each time for different parameters, in threads.
This is of course a simplified version of the real code, and usually the code inside the for loop is ~10-20 lines long, that cannot be made simple by using one auxiliary function like myFunction in the example above (had that been the case, I could've just used a thread pool).
Also, this scenario is very, very common in my code, so there are tons of lines which I consider redundant. It would help me a lot if I didn't need to handle all this boilerplate code, but instead be able to do something like:
for arg in array:
with threaded():
myFunction(arg)
i.e somehow threaded() takes every line of code inside it and runs it in a separate thread.
I know that context managers aren't supposed to be used in such situations, that it's probably a bad idea and will require an ugly hack, but nonetheless - can it be done, and how?
How about this:
for arg in array:
def _thread():
# code here
print arg
t = Thread(_thread)
t.start()
additionally, with decorators, you can sugar it up a little:
def spawn_thread(func):
t = Thread(func)
t.start()
return t
for arg in array:
#spawn_thread
def _thread():
# code here
print arg
Would a thread pool help you here? Many implementations for Python exist, for example this one.
P.S: still interested to know what your exact use-case is
What you want is a kind of "contextual thread pool".
Take a look at the ThreadPool class in this module, designed to be used similar to the manner you've given. Use would be something like this:
with ThreadPool() as pool:
for arg in array:
pool.add_thread(target=myFunction, args=[arg])
Failures in any task given to a ThreadPool will flag an error, and perform the standard error backtrace handling.
I think you're over-complicating it. This is the "pattern" I use:
# util.py
def start_thread(func, *args):
thread = threading.Thread(target=func, args=args)
thread.setDaemon(True)
thread.start()
return thread
# in another module
import util
...
for arg in array:
util.start_thread(myFunction, arg)
I don't see the big deal about having to create myFunction. You could even define the function inline with the function that starts it.
def do_stuff():
def thread_main(arg):
print "I'm a new thread with arg=%s" % arg
for arg in array:
util.start_thread(thread_main, arg)
If you're creating a large number of threads, a thread pool definitely makes more sense. You can easily make your own with the Queue and threading modules. Basically create a jobs queue, create N worker threads, give each thread a "pointer" to the queue and have them pull jobs from the queue and process them.

Categories

Resources