python multiprocessing manager connect creates another object - python

I would like to create shared object among processes. First I created server process which spawned process for class ProcessClass. Then I created another process where I want to connect to shared object.
But connection from another process created its own instance of ProcessClass.
So what I need to do to access this remote shared object.
Here is my test code.
from multiprocessing.managers import BaseManager
from multiprocessing import Process
class ProcessClass:
def __init__(self):
self._state = False
def set(self):
self._state = True
def get(self):
return self._state
class MyManager(BaseManager):
pass
def another_process():
MyManager.register('my_object')
m = MyManager(address=('', 50000))
m.connect()
proxy = m.my_object()
print(f'state from another process: {proxy.get()}')
def test_spawn_and_terminate_process():
MyManager.register('my_object', ProcessClass)
m = MyManager(address=('', 50000))
m.start()
proxy = m.my_object()
proxy.set()
print(f'state from main process: {proxy.get()}')
p = Process(target=another_process)
p.start()
p.join()
print(f'state from main process: {proxy.get()}')
if __name__ == '__main__':
test_spawn_and_terminate_process()
Output is
python test_communication.py
state from main process: True
state from another process: False
state from main process: True

Your code is working as it is supposed to. If you look at the documentation for multiprocessing.managers.SyncManager you will see that there is, for example, a method dict() to create a shareable dictionary. Would you expect that calling this method multiple times would return the same dictionary over and over again or new instances of sharable dictionaries?
What you need to do is enforce a singleton instance to be used repeatedly for successive invocations of proxy = m.my_object() and the way to do that is to first define the following function:
singleton = None
def get_singleton_process_instance():
global singleton
if singleton is None:
singleton = ProcessClass()
return singleton
Then you need to make a one line change in funtion test_spawn_and_terminate_process:
def test_spawn_and_terminate_process():
#MyManager.register('my_object', ProcessClass)
MyManager.register('my_object', get_singleton_process_instance)
This ensures that to satisfy requests for 'my_object', it always invokes get_singleton_process_instance() (returning the singleton) instead of ProcessClass(), which would return a new instance.

Related

Python multiprocessing the value set in run function not reflected in the object

I am running the below code
from multiprocessing import Process
from multiprocessing import Value
class TestMultiprocess(Process):
def __init__(self):
super().__init__()
self.variable={"value":"initial"}
def run(self):
print("executing the run function")
self.variable={"value":"new"}
t=TestMultiprocess()
t.start()
t.join()
print(t.variable) #prints {"value":"initial"}, (I expected {"value":"new"})
I am setting the value of the variable in the run function but this value is not getting reflected in the object (It still holds the value initialised in constructor). Why is this??
How do I set/change am attribute value in the run function of the class inheriting multiprocessing and use the value from the client code which creates the object of the class. I looked at multiprocessing.values but this does not have support for dictionaries. The attribute value I need to set is a dictionary
Your process is running in a different address space. Therefore, your TestMultiProcess instance must be serialized and sent to that child process's address space where it is then deserialized (all of this is done with the pickle module). So what is being updated is a copy of your instance that is in the child process's address space; the main process's instance is never updated.
The simplest solution is to used a managed dictionary. The actual dictionary "lives" in the address space of the Manager and variable managed_dict in the code below is actually a proxy object such that when one of its methods is called, the method name and its arguments are sent to the Manager process where the actual dictionary is updated:
from multiprocessing import Process, Manager
class TestMultiprocess(Process):
def __init__(self, managed_dict):
super().__init__()
self.managed_dict = managed_dict
self.managed_dict["value"] = "initial"
def run(self):
print("executing the run function")
self.managed_dict["value"] = "new"
# Required for Windows:
if __name__ == '__main__':
with Manager() as manager:
managed_dict = manager.dict()
t = TestMultiprocess(managed_dict)
print(t.managed_dict)
t.start()
t.join()
print(t.managed_dict)
Prints:
{'value': 'initial'}
executing the run function
{'value': 'new'}

Can I dynamically register objects to proxy with a multiprocessing BaseManager?

There are plenty of examples of using a multiprocessing BaseManager-derived class to register a method for returning a queue handle proxy, that clients can then use to pull/put from the queue.
This is great, but I have a different scenario - what if the number of queues that I need to proxy changes in response to outside events? What I really want is to proxy a method that returns a specific queue given a UID.
I tried this out but I couldn't get it to work, it appears that the only things that are available are what is registered with the class before the object is instantiated. I'm unable to BaseManager.register("my-new-queue", lambda: queue.Queue) once I've already instantiated an instance of that class and caused it to run.
Is there any way around this? It feels to me like we should be able to dynamically handle this
The registration is most important in the "server" process where the callable will actually get called. Registering a callable in a "client" process only adds that typeid (the string you pass to register) as a method to the manager class. The rub is that running the server blocks, preventing you from registering new callables, and it occurs in another process making it further difficult to modify the registry.
I've been tinkering with this a little while... imao managers are cursed.. I think your prior question would also be answered (aside from our discussion in the comments) by the thing that solved it. Basically python attempts to be a little bit secure about not sending around the authkey parameter for proxied objects, but it stumbles sometimes (particularly with nested proxies). The fix is to set the default authkey for the process mp.current_process().authkey = b'abracadabra' which is used as the fallback when authkey=None (https://bugs.python.org/issue7503)
Here's my full testing script which is derived from the remote manager example from the docs. Basically I create a shared dict to hold shared queues:
#server process
from multiprocessing.managers import BaseManager, DictProxy
from multiprocessing import current_process
from queue import Queue
queues = {} #dict[uuid, Queue]
class QueueManager(BaseManager):
pass
QueueManager.register('new_queue', callable=Queue)
QueueManager.register('get_queues', callable=lambda:queues, proxytype=DictProxy)
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
current_process().authkey = b'abracadabra'
s = m.get_server()
s.serve_forever()
#process A
from multiprocessing.managers import BaseManager
from multiprocessing import current_process
class QueueManager(BaseManager):
pass
QueueManager.register('new_queue')
QueueManager.register('get_queues')
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
m.connect()
current_process().authkey = b'abracadabra'
queues_dict = m.get_queues()
queues_dict['my_uuid'] = m.new_queue()
queues_dict['my_uuid'].put("this is a test")
#process B
from multiprocessing.managers import BaseManager
from multiprocessing import current_process
class QueueManager(BaseManager):
pass
QueueManager.register('new_queue')
QueueManager.register('get_queues')
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
m.connect()
current_process().authkey = b'abracadabra'
queues_dict = m.get_queues()
print(queues_dict['my_uuid'].get())
EDIT:
Regarding the comments: "get_queue take the UUID and return the specific queue" the modification is simple, and does not involve nested proxies thereby avoiding the digest auth issue:
#server process
from multiprocessing.managers import BaseManager
from collections import defaultdict
from queue import Queue
queues = defaultdict(Queue)
class QueueManager(BaseManager): pass
QueueManager.register('get_queue', callable=lambda uuid:queues[uuid])
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
s = m.get_server()
s.serve_forever()
#process A
from multiprocessing.managers import BaseManager
class QueueManager(BaseManager): pass
QueueManager.register('get_queue')
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
m.connect()
m.get_queue("my_uuid").put("this is a test")
#process B
from multiprocessing.managers import BaseManager
class QueueManager(BaseManager): pass
QueueManager.register('get_queue')
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
m.connect()
print(m.get_queue("my_uuid").get())
Aaron's answer is perhaps the simplest way here, where you share a dictionary and store the queues in that shared dictionary. However, it does not answer the problem of not being able to update the methods on a manager once it has started. Therefore, here is a more complete solution, less verbose than it's alternative, where you can update the registry even after the server has started:
from queue import Queue
from multiprocessing.managers import SyncManager, Server, State, dispatch
from multiprocessing.context import ProcessError
class UpdateServer(Server):
public = ['shutdown', 'create', 'accept_connection', 'get_methods',
'debug_info', 'number_of_objects', 'dummy', 'incref', 'decref', 'update_registry']
def update_registry(self, c, registry):
with self.mutex:
self.registry.update(registry)
def get_server(self):
if self._state.value != State.INITIAL:
if self._state.value == State.STARTED:
raise ProcessError("Already started server")
elif self._state.value == State.SHUTDOWN:
raise ProcessError("Manager has shut down")
else:
raise ProcessError(
"Unknown state {!r}".format(self._state.value))
return self._Server(self._registry, self._address,
self._authkey, self._serializer)
class UpdateManager(SyncManager):
_Server = UpdateServer
def update_registry(self):
assert self._state.value == State.STARTED, 'server not yet started'
conn = self._Client(self._address, authkey=self._authkey)
try:
dispatch(conn, None, 'update_registry', (type(self)._registry, ), {})
finally:
conn.close()
class MyQueue:
def __init__(self):
self.initialized = False
self.q = None
def initialize(self):
self.q = Queue()
def __call__(self):
if not self.initialized:
self.initialize()
self.initialized = True
return self.q
if __name__ == '__main__':
# Create an object of wrapper class, note that we do not initialize the queue right away (it's unpicklable)
queue = MyQueue()
manager = UpdateManager()
manager.start()
# If you register new typeids, then call update_registry. The method_to_typeid parameter maps the __call__ method to
# return a proxy of the queue instead since Queues are not picklable
UpdateManager.register('uuid', queue, method_to_typeid={'__call__': 'Queue'})
manager.update_registry()
# Once the object is stored in the manager process, now we can safely initialize the queue and share
# it among processes. Initialization is implicit when we call uuid() if it's not already initialized
q = manager.uuid()
q.put('bye')
print(q.get())
Over here, UpdateServer and UpdateManager add support for method update_registry which informs the server if any new typeid's are registered with the manager. MyQueue is simply a wrapper class to return the new queues registered if called directly. While it's functionally similar to registering lambda : queue, the wrapper is necessary because lamdba functions are not picklable and the server process is being started in a new process here (rather than doing server.serve_forever() in the main process, but you can do that too if you want).
So, you can now register typeids even after the manager process is running, just make sure to call the update_registry function right after. This function call will even work if you are starting the server in the main process itself (by using serve_forever, like in Aaron's answer) and connecting to it from another process using manager.connect.

Pass complex object instance to class that subclasses process

I have a large Python 3.6 system where multiple processes and threads interact with each other and the user. Simplified, there is a Scheduler instance (subclasses threading.Thread) and a Worker instance (subclasses multiprocessing.Process). Both objects run for the entire duration of the program.
The user interacts with the Scheduler by adding Task instances and the Scheduler passes the task to the Worker at the correct moment in time. The worker uses the information contained in the task to do its thing.
Below is some stripped out and simplified code out of the project:
class Task:
def __init__(self, name:str):
self.name = name
self.state = 'idle'
class Scheduler(threading.Thread):
def __init__(self, worker:Worker):
super().init()
self.worker = worker
self.start()
def run(self):
while True:
# Do stuff until the user schedules a new task
task = Task() # <-- In reality the Task intance is not created here but the thread gets it from elsewhere
task.state = 'scheduled'
self.worker.change_task(task)
# Do stuff until the task.state == 'finished'
class Worker(multiprocessing.Process):
def __init__(self):
super().init()
self.current_task = None
self.start()
def change_task(self, new_task:Task):
self.current_task = new_task
self.current_task.state = 'accepted-idle'
def run(self):
while True:
# Do stuff until the current task is updated
self.current_task.state = 'accepted-running'
# Task is running
self.current_task.state = 'finished'
The system used to be structured so that the task contained multiple multiprocessing.Events indicating each of its possible states. Then, not the whole Task instance was passed to the worker, but each of the task's attributes was. As they were all multiprocessing safe, it worked, with a caveat. The events changed in worker.run had to be created in worker.run and back passed to the task object for it work. Not only is this a less than ideal solution, it no longer works with some changes I am making to the project.
Back to the current state of the project, as described by the python code above. As is, this will never work because nothing makes this multiprocessing safe at the moment. So I implemented a Proxy/BaseManager structure so that when a new Task is needed, the system gets it from the multiprocessing manager. I use this structure in a sightly different way elsewhere in the project as well. The issue is that the worker.run never knows that the self.current_task is updated, it remains None. I expected this to be fixed by using the proxy but clearly I am mistaken.
def Proxy(target: typing.Type) -> typing.Type:
"""
Normally a Manager only exposes only object methods. A NamespaceProxy can be used when registering the object with
the manager to expose all the attributes. This also works for attributes created at runtime.
https://stackoverflow.com/a/68123850/8353475
1. Instead of exposing all the attributes manually, we effectively override __getattr__ to do it dynamically.
2. Instead of defining a class that subclasses NamespaceProxy for each specific object class that needs to be
proxied, this method is used to do it dynamically. The target parameter should be the class of the object you want
to generate the proxy for. The generated proxy class will be returned.
Example usage: FooProxy = Proxy(Foo)
:param target: The class of the object to build the proxy class for
:return The generated proxy class
"""
# __getattr__ is called when an attribute 'bar' is called from 'foo' and it is not found eg. 'foo.bar'. 'bar' can
# be a class method as well as a variable. The call gets rerouted from the base object to this proxy, were it is
# processed.
def __getattr__(self, key):
result = self._callmethod('__getattribute__', (key,))
# If attr call was for a method we need some further processing
if isinstance(result, types.MethodType):
# A wrapper around the method that passes the arguments, actually calls the method and returns the result.
# Note that at this point wrapper() does not get called, just defined.
def wrapper(*args, **kwargs):
# Call the method and pass the return value along
return self._callmethod(key, args, kwargs)
# Return the wrapper method (not the result, but the method itself)
return wrapper
else:
# If the attr call was for a variable it can be returned as is
return result
dic = {'types': types, '__getattr__': __getattr__}
proxy_name = target.__name__ + "Proxy"
ProxyType = type(proxy_name, (NamespaceProxy,), dic)
# This is a tuple of all the attributes that are/will be exposed. We copy all of them from the base class
ProxyType._exposed_ = tuple(dir(target))
return ProxyType
class TaskManager(BaseManager):
pass
TaskProxy = Proxy(Task)
TaskManager.register('get_task', callable=Task, proxytype=TaskProxy)

How to run Python custom objects in separate processes, all working on a shared events queue?

I have 4 different Python custom objects and an events queue. Each obect has a method that allows it to retrieve an event from the shared events queue, process it if the type is the desired one and then puts a new event on the same events queue, allowing other processes to process it.
Here's an example.
import multiprocessing as mp
class CustomObject:
def __init__(events_queue: mp.Queue) -> None:
self.events_queue = event_queue
def process_events_queue() -> None:
event = self.events_queue.get()
if type(event) == SpecificEventDataTypeForThisClass:
# do something and create a new_event
self.events_queue.put(new_event)
else:
self.events_queue.put(event)
# there are other methods specific to each object
These 4 objects have specific tasks to do, but they all share this same structure. Since I need to "simulate" the production condition, I want them to run all at the same time, indipendently from eachother.
Here's just an example of what I want to do, if possible.
import multiprocessing as mp
import CustomObject
if __name__ == '__main__':
events_queue = mp.Queue()
data_provider = mp.Process(target=CustomObject, args=(events_queue,))
portfolio = mp.Process(target=CustomObject, args=(events_queue,))
engine = mp.Process(target=CustomObject, args=(events_queue,))
broker = mp.Process(target=CustomObject, args=(events_queue,))
while True:
data_provider.process_events_queue()
portfolio.process_events_queue()
engine.process_events_queue()
broker.process_events_queue()
My idea is to run each object in a separate process, allowing them to communicate with events shared through the events_queue. So my question is, how can I do that?
The problem is that obj = mp.Process(target=CustomObject, args=(events_queue,)) returns a Process instance and I can't access the CustomObject methods from it. Also, is there a smarter way to achieve what I want?
Processes require a function to run, which defines what the process is actually doing. Once this function exits (and there are no non-daemon threads) the process is done. This is similar to how Python itself always executes a __main__ script.
If you do mp.Process(target=CustomObject, args=(events_queue,)) that just tells the process to call CustomObject - which instantiates it once and then is done. This is not what you want, unless the class actually performs work when instantiated - which is a bad idea for other reasons.
Instead, you must define a main function or method that handles what you need: "communicate with events shared through the events_queue". This function should listen to the queue and take action depending on the events received.
A simple implementation looks like this:
import os, time
from multiprocessing import Queue, Process
class Worker:
# separate input and output for simplicity
def __init__(self, commands: Queue, results: Queue):
self.commands = commands
self.results = results
# our main function to be run by a process
def main(self):
# each process should handle more than one command
while True:
value = self.commands.get()
# pick a well-defined signal to detect "no more work"
if value is None:
self.results.put(None)
break
# do whatever needs doing
result = self.do_stuff(value)
print(os.getpid(), ':', self, 'got', value, 'put', result)
time.sleep(0.2) # pretend we do something
# pass on more work if required
self.results.put(result)
# placeholder for what needs doing
def do_stuff(self, value):
raise NotImplementedError
This is a template for a class that just keeps on processing events. The do_stuff method must be overloaded to define what actually happens.
class AddTwo(Worker):
def do_stuff(self, value):
return value + 2
class TimesThree(Worker):
def do_stuff(self, value):
return value * 3
class Printer(Worker):
def do_stuff(self, value):
print(value)
This already defines fully working process payloads: Process(target=TimesThree(in_queue, out_queue).main) schedules the main method in a process, listening for and responding to commands.
Running this mainly requires connecting the individual components:
if __name__ == '__main__':
# bookkeeping of resources we create
processes = []
start_queue = Queue()
# connect our workers via queues
queue = start_queue
for element in (AddTwo, TimesThree, Printer):
instance = element(queue, Queue())
# we run the main method in processes
processes.append(Process(target=instance.main))
queue = instance.results
# start all processes
for process in processes:
process.start()
# send input, but do not wait for output
start_queue.put(1)
start_queue.put(248124)
start_queue.put(-256)
# send shutdown signal
start_queue.put(None)
# wait for processes to shutdown
for process in processes:
process.join()
Note that you do not need classes for this. You can also compose functions for a similar effect, as long as everything is pickle-able:
import os, time
from multiprocessing import Queue, Process
def main(commands, results, do_stuff):
while True:
value = commands.get()
if value is None:
results.put(None)
break
result = do_stuff(value)
print(os.getpid(), ':', do_stuff, 'got', value, 'put', result)
time.sleep(0.2)
results.put(result)
def times_two(value):
return value * 2
if __name__ == '__main__':
in_queue, out_queue = Queue(), Queue()
worker = Process(target=main, args=(in_queue, out_queue, times_two))
worker.start()
for message in (1, 3, 5, None):
in_queue.put(message)
while True:
reply = out_queue.get()
if reply is None:
break
print('result:', reply)

Make Singleton class in Multiprocessing

I create Singleton class using Metaclass, it working good in multithreadeds and create only one instance of MySingleton class but in multiprocessing, it creates always new instance
import multiprocessing
class SingletonType(type):
# meta class for making a class singleton
def __call__(cls, *args, **kwargs):
try:
return cls.__instance
except AttributeError:
cls.__instance = super(SingletonType, cls).__call__(*args, **kwargs)
return cls.__instance
class MySingleton(object):
# singleton class
__metaclass__ = SingletonType
def __init__(*args,**kwargs):
print "init called"
def task():
# create singleton class instance
a = MySingleton()
# create two process
pro_1 = multiprocessing.Process(target=task)
pro_2 = multiprocessing.Process(target=task)
# start process
pro_1.start()
pro_2.start()
My output:
init called
init called
I need MySingleton class init method get called only once
Each of your child processes runs its own instance of the Python interpreter, hence the SingletonType in one process doesn't share its state with those in another process. This means that a true singleton that only exists in one of your processes will be of little use, because you won't be able to use it in the other processes: while you can manually share data between processes, that is limited to only basic data types (for example dicts and lists).
Instead of relying on singletons, simply share the underlying data between the processes:
#!/usr/bin/env python3
import multiprocessing
import os
def log(s):
print('{}: {}'.format(os.getpid(), s))
class PseudoSingleton(object):
def __init__(*args,**kwargs):
if not shared_state:
log('Initializating shared state')
with shared_state_lock:
shared_state['x'] = 1
shared_state['y'] = 2
log('Shared state initialized')
else:
log('Shared state was already initalized: {}'.format(shared_state))
def task():
a = PseudoSingleton()
if __name__ == '__main__':
# We need the __main__ guard so that this part is only executed in
# the parent
log('Communication setup')
shared_state = multiprocessing.Manager().dict()
shared_state_lock = multiprocessing.Lock()
# create two process
log('Start child processes')
pro_1 = multiprocessing.Process(target=task)
pro_2 = multiprocessing.Process(target=task)
pro_1.start()
pro_2.start()
# Wait until processes have finished
# See https://stackoverflow.com/a/25456494/857390
log('Wait for children')
pro_1.join()
pro_2.join()
log('Done')
This prints
16194: Communication setup
16194: Start child processes
16194: Wait for children
16200: Initializating shared state
16200: Shared state initialized
16201: Shared state was already initalized: {'x': 1, 'y': 2}
16194: Done
However, depending on your problem setting there might be better solutions using other mechanisms of inter-process communication. For example, the Queue class is often very useful.

Categories

Resources