I'm new to python multiprocess API. I have a custom subclass of multiprocess.Process(), lets call it MyProcess. Many examples I see defineQueues in __main__ and then pass to the Process constructor.
In my case, I spawn N Process subclasses and 2 Queue for each (pre and post process). I'd prefer to put the Queue initialization in each subprocess:
import multiprocessing as mp
class MyProcess(mp.Process) :
def __init__(self,ID) :
mp.Process.__init__(self)
self.name = ID
self.queues = {'pre':mp.Queue(),'post':mp.Queue()}
if __name__ == "__main__" :
my_proc = MyProcess(ID)
Rather than:
import multiprocessing as mp
class MyProcess(mp.Process) :
def __init__(self,ID,queues) :
mp.Process.__init__(self)
self.name = ID
self.queues = queues
if __name__ == "__main__" :
my_proc = MyProcess(ID,{'pre':mp.Queue(),'post':mp.Queue()})
Is this possible or is there a pickle/sync/scope problem here?
After some testing, the latter appears to work just fine.
Related
Here is a reproducible code:
from multiprocessing import Process, Manager
manager = Manager()
shared_results_dict = manager.dict()
class WorkerProcess(Process):
def __init__(self, shared_results_dict):
super(WorkerProcess, self).__init__()
self.shared_results_dict = shared_results_dict
def run(self):
self.shared_results_dict['a'] = 3
subproc = WorkerProcess(shared_results_dict)
subproc.daemon = True
subproc.start()
shared_results_dict['a']
The code above works fine when the start method for the multiprocessing is set as fork, but it fails to work when it is set to either forkserver or spawn. I thought Manager should work with whatever start method I use?
If you are running in Jupyter Notebook, you need to put your Process subclass definition in a separate .py file and you will need to add a from multiprocessing import Process statement to that file. You will also need to put any code that creates subprocesses within a block controlled by if __name__ == '__main__':. Finally, you really need to wait for the subprocess to complete to be sure it has updated the dictionary if you are looking to print out an updated dictionary in the main process. Thus, it is pointless to use a daemon process:
File worker.py (for example)
from multiprocessing import Process
class WorkerProcess(Process):
def __init__(self, shared_results_dict):
super(WorkerProcess, self).__init__()
self.shared_results_dict = shared_results_dict
def run(self):
self.shared_results_dict['a'] = 3
Your Jupyter Notebook Cell:
from multiprocessing import Manager
from worker import WorkerProcess
if __name__ == '__main__':
manager = Manager()
shared_results_dict = manager.dict()
subproc = WorkerProcess(shared_results_dict)
#subproc.daemon = True
subproc.start()
# wait for process to terminate to be sure the dict has been updated
subproc.join()
print(shared_results_dict['a'])
Prints:
3
I'm trying to understand why python can not compile the following class.
class SharedResource(multiprocessing.Lock):
def __init__(self, blocking=True, timeout=-1):
# super().__init__(blocking=True, timeout=-1)
self.blocking = blocking
self.timeout = timeout
self.data = {}
TypeError: method expected 2 arguments, got 3
The reason why I'm subclassing Lock
my objective is to create a shared list of resource that should be usable only by on process at a time.
this concept will be eventually in a Flash application where the request should not be able to use the resource concurrently
RuntimeError: Lock objects should only be shared between processes through inheritance
class SharedResource():
def __init__(self, id, model):
'''
id: mode id
model: Keras Model only one worker at a time can call predict
'''
self.mutex = Lock()
self.id = id
self.model = model
manager = Manager()
shared_list = manager.list() # a List of models
shared_list.append(SharedResource())
def worker1(l):
...read some data
while True:
resource = l[0]
with m:
resource['model'].predict(...some data)
time.sleep(60)
if __name__ == "__main__":
processes = [ Process(target=worker1, args=[shared_list])]
for p in processes:
p.start()
for p in processes:
p.join()
The reason you are getting this error is because multiprocessing.Lock is actually a function.
In .../multiprocessing/context.py there are these lines:
def Lock(self):
'''Returns a non-recursive lock object'''
from .synchronize import Lock
return Lock(ctx=self.get_context())
This may change in the future so you can verify this on your version of python by doing:
import multiprocessing
print(type(multiprocessing.Lock))
To actually subclass Lock you will need to do something like this:
from multiprocessing import synchronize
from multiprocessing.synchronize import Lock
# Since Lock is now a class, this should work:
class SharedResource(Lock):
pass
I'm not endorsing this approach as a "good" solution, but it should solve your problem if you really need to subclass Lock. Subclassing things that try to avoid being subclassed is usually not a great idea, but sometimes it can be necessary. If you can solve the problem in a different way you may want to consider that.
I have a simple Python script which runs a parallel pool:
import multiprocessing as mp
def square(x):
return x**2
if __name__ == '__main__':
pool = mp.Pool(4)
results=pool.map(square,range(1,20))
It works fine and as expected. However, if I import any simple custom class, such as the code below, it doesn't work any more. I start the script execution, but the script does not terminate. This is weird, as I do not use the imported class.
import multiprocessing as mp
from Person import Person
def square(x):
return x**2
if __name__ == '__main__':
pool = mp.Pool(4)
results=pool.map(square,range(1,20))
The class Person is very simple:
class Person:
def __init__(self, id):
self.id = id
What is the reason for this behavior and how can I fix it?
EDIT: I am using Windows 10
The doc of multiprocessing.set_start_method note that:
Note that this should be called at most once, and it should be protected inside the if name == 'main' clause of the main module.
However, if I put multiprocessing.set_start_method('spawn') in a pytest module fixture, I do not know will does it work perfectly.
Indeed, as stated in the documentation, you will be in trouble if you try to call multiprocessing.set_start_method() from multiple unit tests functions. Moreover, this will affect your whole program and may interoperate badly with the entire tests suit.
However, there exists a workaround which is described in the documentation too:
Alternatively, you can use get_context() to obtain a context
object. Context objects have the same API as the multiprocessing
module, and allow one to use multiple start methods in the same
program.
import multiprocessing as mp
def foo(q):
q.put('hello')
if __name__ == '__main__':
ctx = mp.get_context('spawn')
q = ctx.Queue()
p = ctx.Process(target=foo, args=(q,))
p.start()
print(q.get())
p.join() ```
This method can be used per-test to avoid compatibility issues discussed. It can be combined with "monkeypatching" or "mocking" to test your class with different start methods:
# my_class.py
import multiprocessing
class MyClass:
def __init__(self):
self._queue = multiprocessing.Queue()
def process(self, x):
# Very simplified example of a method using a multiprocessing Queue
self._queue.put(x)
return self._queue.get()
# tests/test_my_class.py
import multiprocessing
import my_class
def test_spawn(monkeypatch):
ctx = multiprocessing.get_context('spawn')
monkeypatch.setattr(my_class.multiprocessing, "Queue", ctx.Queue)
obj = my_class.MyClass()
assert obj.process(6) == 6
def test_fork(monkeypatch):
ctx = multiprocessing.get_context('fork')
monkeypatch.setattr(my_class.multiprocessing, "Queue", ctx.Queue)
obj = my_class.MyClass()
assert obj.process(6) == 6
If you really do always want to use the same start method, you can set it in a session-scoped fixture in the file conftest.py in the root of your source tree. E.g.
# conftest.py
import multiprocessing
import pytest
#pytest.fixture(scope="session", autouse=True)
def always_spawn():
multiprocessing.set_start_method("spawn")
(I found a decent solution here for this, but unfortunately I'm using IronPython which does not implement the mutliprocessing module ...)
Driving script Threader.py will call Worker.py's single function twice, using the threading module.
Its single function just fetches a dictionary of data.
Roughly speaking:
Worker.py
def GetDict():
:
:
:
return theDict
Threader.py
import threading
from Worker import GetDict
:
:
:
def ThreadStart():
t = threading.Thread(target=GetDict)
t.start()
:
:
In the driver script Threader.py, I want to be able to operate on the two dictionaries outputted by the 2 instances of Worker.py.
The accepted answer here involving the Queue module seems to be what I need in terms of accessing return values, but this is written from the point of view of everthing being doen in a single script. How do I go about making the return values of the function called in Worker.py available to Threader.py (or any other script for that matter)?
Many thanks
another way to do what you want (without using a Queue) would be by using the concurrent.futures module (from python3.2, for earlier versions there is a backport).
using this, your example would work like this:
from concurrent import futures
def GetDict():
return {'foo':'bar'}
# imports ...
# from Worker import GetDict
def ThreadStart():
executor = futures.ThreadPoolExecutor(max_workers=4)
future = executor.submit(GetDict)
print(future.result()) # blocks until GetDict finished
# or doing more then one:
jobs = [executor.submit(GetDict) for i in range(10)]
for j in jobs:
print(j.result())
if __name__ == '__main__':
ThreadStart()
edit:
something similar woule be to use your own thread to execute the target function and save it's return value, something like this:
from threading import Thread
def GetDict():
return {'foo':'bar'}
# imports ...
# from Worker import GetDict
class WorkerThread(Thread):
def __init__(self, fnc, *args, **kwargs):
super(WorkerThread, self).__init__()
self.fnc = fnc
self.args = args
self.kwargs = kwargs
def run(self):
self.result = self.fnc(*self.args, **self.kwargs)
def ThreadStart():
jobs = [WorkerThread(GetDict) for i in range(10)]
for j in jobs:
j.start()
for j in jobs:
j.join()
print(j.result)
if __name__ == '__main__':
ThreadStart()