Python Multiprocessing - apply class method to a list of objects - python

Is there a simple way to use Multiprocessing to do the equivalent of this?
for sim in sim_list:
sim.run()
where the elements of sim_list are "simulation" objects and run() is a method of the simulation class which does modify the attributes of the objects. E.g.:
class simulation:
def __init__(self):
self.state['done']=False
self.cmd="program"
def run(self):
subprocess.call(self.cmd)
self.state['done']=True
All the sim in sim_list are independent, so the strategy does not have to be thread safe.
I tried the following, which is obviously flawed because the argument is passed by deepcopy and is not modified in-place.
from multiprocessing import Process
for sim in sim_list:
b = Process(target=simulation.run, args=[sim])
b.start()
b.join()

One way to do what you want is to have your computing class (simulation in your case) be a subclass of Process. When initialized properly, instances of this class will run in separate processes and you can set off a group of them from a list just like you wanted.
Here's an example, building on what you wrote above:
import multiprocessing
import os
import random
class simulation(multiprocessing.Process):
def __init__(self, name):
# must call this before anything else
multiprocessing.Process.__init__(self)
# then any other initialization
self.name = name
self.number = 0.0
sys.stdout.write('[%s] created: %f\n' % (self.name, self.number))
def run(self):
sys.stdout.write('[%s] running ... process id: %s\n'
% (self.name, os.getpid()))
self.number = random.uniform(0.0, 10.0)
sys.stdout.write('[%s] completed: %f\n' % (self.name, self.number))
Then just make a list of objects and start each one with a loop:
sim_list = []
sim_list.append(simulation('foo'))
sim_list.append(simulation('bar'))
for sim in sim_list:
sim.start()
When you run this you should see each object run in its own process. Don't forget to call Process.__init__(self) as the very first thing in your class initialization, before anything else.
Obviously I've not included any interprocess communication in this example; you'll have to add that if your situation requires it (it wasn't clear from your question whether you needed it or not).
This approach works well for me, and I'm not aware of any drawbacks. If anyone knows of hidden dangers which I've overlooked, please let me know.
I hope this helps.

For those who will be working with large data sets, an iterable would be your solution here:
import multiprocessing as mp
pool = mp.Pool(mp.cpu_count())
pool.imap(sim.start, sim_list)

Related

python multiproccesing and thread safety in shared objects

I am a bit uncertain regarding thread safety and multiprocessing.
From what I can tell multiprocessing.Pool.map pickles the calling function or object but leaves members passed by references intact.
This seems like it could be beneficial since it saves memory but I haven't found any information about the thread safety in those objects.
In my case I am trying to read numpy data from disk, however, I want to be able to modify the source with out changing the implementation so I've broken out the reading part to its own classes.
I roughly have the following situation:
import numpy as np
from multiprocessing import Pool
class NpReader():
def read_row(self, row_index):
pass
class NpReaderSingleFile(NpReader):
def read_row(self, row_index):
return np.load(self._filename_from_row(row_index))
def _filename_from_row(self, row_index):
return Path(row_index).with_suffix('.npy')
class NpReaderBatch(NpReader):
def __init__(self, batch_file, mmap_mode=None):
self.batch = np.load(batch_file, mmap_mode=mmap_mode)
def read_row(self, row_index):
read_index = row_index
return self.batch[read_index]
class ProcessRow():
def __init__(self, reader):
self.reader = reader
def __call__(self, row_index):
return reader.read_row(row_index).shape
readers = [
NpReaderSingleFile(),
NpReaderBatch('batch.npy'),
NpReaderBatch('batch.npy', mmap_mode='r')
]
res = []
for reader in readers:
with Pool(12) as mp:
res.append(mp.map(ProcessRow(reader), range(100000))
It seems to me that there are alot of things that could go wrong here but I, unfortunately does not have the knowledge to determine what of test for it.
Are there any obvious problems with the above approach?
Some things that occurred to me are:
np.load (it seems to work well for small single files, but can I test it to see that it is safe?
Is NpReaderBatch safe or can read_index be modified at the same time by different processes?

object has no attribute while trying to define thread

This is the code:
class foo:
def levelOne(self):
def worker(self, i):
print('doing hard work')
def writer(self):
print('writing workers work')
session = foo()
i=0
threads = list()
for i in range(0,5):
thread = threading.Thread(target=session.levelOne.worker, args=(i,))
thread.start()
threads.append(thread)
writerThread = threading.Thread(target=session.levelOne.writer)
writerThread.start()
for thread in threads:
thread.join()
writerThread.join()
5 workers should do the job and the writer should collect their results.
The error I get is: session object has no attribute worker
workers are actually testers that do a certain work in different "areas" while writer is keeping track of them without making my workers return any result.
It's important for this algorithm to be divided on layers like "levelOne", "levelTwo" etc. because they will all work together. This is the main reason why I keep the threading outside the class instead of the levelOne method.
please help me understand where I'm wrong
You certainly dont have "session object has no attribute worker" as error message with the code you posted - the error should be "'function' object has no attribute 'worker'". And actually I don't know why you'd expect anything else - names defined within a function are local variables (hint: python functions are objects just like any other), they do not become attributes of the function.
It's important for this algorithm to be divided on layers like "levelOne", "levelTwo"
Well, possibly but that's not the proper design. If you want foo to be nothing but a namespace and levelOne, levelTwo etc to be instances of some type having bot a writer and worker methods, then you need to 1/ define your LevelXXX as classes, 2/ build instances of those objects as attributes of your foo class, ie:
class LevelOne():
def worker(self, i):
# ...
def writer(self):
# ...
class foo():
levelOne = LevelOne()
Now whether this is the correct design for your use case is not garanteed in any way, but it's impossible to design a proper solution without knowing anything about the problem...
If it's possible could you explain why trying to access workers and writer as shown in question's code is bad design?
Well, for the mere reason that it doesn't work, to start with, obviously xD.
Note that you could return the "worker" and "writer" functions from the levelOne method, ie:
class foo:
def levelOne(self):
def worker(self, i):
print('doing hard work')
def writer(self):
print('writing workers work')
return worker, writer
session = foo()
worker, writer = session.levelOne()
# etc
but this is both convoluted (assuming the point is to let worker and writer share self, which is much more simply done using a proper LevelOne class and making worker and writer methods of this class) and inefficient (def is an executable statement, so with your solution the worker and writer functions are created anew - which is not free - on each call).

python: how to get result from a call functions in real time?

What I want to do is get result from call functions in real time.
For example, I want to get the result of i in class model in real time.However, if I use return,I can only get the result of i once.
import threading
class model(object):
"""docstring for model"""
def __init__(self):
pass
def func(self):
for i in range(1000):
print('i',i)
return i
class WorkThread(threading.Thread):
# trigger = pyqtSignal()
def __int__(self):
super(WorkThread,self).__init__()
def run(self):
model1=model()
result = model1.func() #I want to get `i` from class model in real time,however return can only get once.
print('result',result)
if __name__ == '__main__':
WorkThread=WorkThread()
WorkThread.start()
for j in range(1000,2000):
print('j',j)
Anyone has a good idea? Hopefully for help.
You have several options; you could:
Use a generator function, to produce the results as you iterate. This requires that the model1.func() call loops over the generator returned by the model1.func() call. Use this if you don't need access to the data from another thread.
Use a queue; push i results into the queue as you produce them, and another thread can receive them from the queue.

Python class instance starts method in new thread

I spent the last hour(s???) looking/googling for a way to have a class start one of its methods in a new thread as soon as it is instanciated.
I could run something like this:
x = myClass()
def updater():
while True:
x.update()
sleep(0.01)
update_thread = Thread(target=updater)
update_thread.daemon = True
update_thread.start()
A more elegant way would be to have the class doing it in init when it is instanciated.
Imagine having 10 instances of that class...
Until now I couldn't find a (working) solution for this problem...
The actual class is a timer and the method is an update method that updates all the counter's variables. As this class also has to run functions at a given time it is important that the time updates won't be blocked by the main thread.
Any help is much appreciated. Thx in advance...
You can subclass directly from Thread in this specific case
from threading import Thread
class MyClass(Thread):
def __init__(self, other, arguments, here):
super(MyClass, self).__init__()
self.daemon = True
self.cancelled = False
# do other initialization here
def run(self):
"""Overloaded Thread.run, runs the update
method once per every 10 milliseconds."""
while not self.cancelled:
self.update()
sleep(0.01)
def cancel(self):
"""End this timer thread"""
self.cancelled = True
def update(self):
"""Update the counters"""
pass
my_class_instance = MyClass()
# explicit start is better than implicit start in constructor
my_class_instance.start()
# you can kill the thread with
my_class_instance.cancel()
In order to run a function (or memberfunction) in a thread, use this:
th = Thread(target=some_func)
th.daemon = True
th.start()
Comparing this with deriving from Thread, it has the advantage that you don't export all of Thread's public functions as own public functions. Actually, you don't even need to write a class to use this code, self.function or global_function are both equally usable as target here.
I'd also consider using a context manager to start/stop the thread, otherwise the thread might stay alive longer than necessary, leading to resource leaks and errors on shutdown. Since you're putting this into a class, start the thread in __enter__ and join with it in __exit__.

How to share object tree with process fork?

I don't have much experience with multithreading, and I'm trying to get something like the below working:
from multiprocessing import Process
class Node:
def __init__(self):
self.children = {}
class Test(Process):
def __init__(self, tree):
super().__init__()
self.tree = tree
def run(self):
# infinite loop which does stuff to the tree
self.tree.children[1] = Node()
self.tree.children[2] = Node()
x = Node()
t = Test(x)
t.start()
print(x.children) # random access to tree
I realize this shouldn't (and doesn't) work for a variety of very sensible reasons, but I'm not sure how to get it to work. Referring to the documentation, it seems that I need to do something with Managers and Proxies, but I honestly have no idea where to start, or whether that is actually what I'm looking for. Could someone provide an example of the above that works?
multiprocessing has limited support for implicitly shared objects, which can even share lists and dicts.
In general, multiprocessing is shared-nothing (after the initial fork) and relies on explicit communication between the processes. This adds overhead (how much really depends on the kind of interaction between the processes), but neatly avoids a lot of the pitfalls of multithreaded programming. The high-level building blocks of multiprocessing favor master/slave models (esp. the Pool class), with masters handing out work items, and slaves operating on them, returning results.
Keeping state in sync across several processes may, depending how often they change, incur a prohibitive overhead.
TL;DR: It can be done, but probably shouldn't.
import time, multiprocessing
class Test(multiprocessing.Process):
def __init__(self, manager):
super().__init__()
self.quit = manager.Event()
self.dict = manager.dict()
def stop(self):
self.quit.set()
self.join()
def run(self):
self.dict['item'] = 0
while not self.quit.is_set():
time.sleep(1)
self.dict['item'] += 1
m = multiprocessing.Manager()
t = Test(m)
t.start()
for x in range(10):
time.sleep(1.2)
print(t.dict)
t.stop()
The multiprocessing examples show how to create proxies for more complicated objects, which should allow you to implement the tree structure in your question.
It seems to me that what you want is actual multithreading, rather than multiprocessing. With threads rather than processes, you can do precisely that, since threads run in the same process, sharing all memory and therefore data with each other.

Categories

Resources