I'm trying to understand why python can not compile the following class.
class SharedResource(multiprocessing.Lock):
def __init__(self, blocking=True, timeout=-1):
# super().__init__(blocking=True, timeout=-1)
self.blocking = blocking
self.timeout = timeout
self.data = {}
TypeError: method expected 2 arguments, got 3
The reason why I'm subclassing Lock
my objective is to create a shared list of resource that should be usable only by on process at a time.
this concept will be eventually in a Flash application where the request should not be able to use the resource concurrently
RuntimeError: Lock objects should only be shared between processes through inheritance
class SharedResource():
def __init__(self, id, model):
'''
id: mode id
model: Keras Model only one worker at a time can call predict
'''
self.mutex = Lock()
self.id = id
self.model = model
manager = Manager()
shared_list = manager.list() # a List of models
shared_list.append(SharedResource())
def worker1(l):
...read some data
while True:
resource = l[0]
with m:
resource['model'].predict(...some data)
time.sleep(60)
if __name__ == "__main__":
processes = [ Process(target=worker1, args=[shared_list])]
for p in processes:
p.start()
for p in processes:
p.join()
The reason you are getting this error is because multiprocessing.Lock is actually a function.
In .../multiprocessing/context.py there are these lines:
def Lock(self):
'''Returns a non-recursive lock object'''
from .synchronize import Lock
return Lock(ctx=self.get_context())
This may change in the future so you can verify this on your version of python by doing:
import multiprocessing
print(type(multiprocessing.Lock))
To actually subclass Lock you will need to do something like this:
from multiprocessing import synchronize
from multiprocessing.synchronize import Lock
# Since Lock is now a class, this should work:
class SharedResource(Lock):
pass
I'm not endorsing this approach as a "good" solution, but it should solve your problem if you really need to subclass Lock. Subclassing things that try to avoid being subclassed is usually not a great idea, but sometimes it can be necessary. If you can solve the problem in a different way you may want to consider that.
Related
There are plenty of examples of using a multiprocessing BaseManager-derived class to register a method for returning a queue handle proxy, that clients can then use to pull/put from the queue.
This is great, but I have a different scenario - what if the number of queues that I need to proxy changes in response to outside events? What I really want is to proxy a method that returns a specific queue given a UID.
I tried this out but I couldn't get it to work, it appears that the only things that are available are what is registered with the class before the object is instantiated. I'm unable to BaseManager.register("my-new-queue", lambda: queue.Queue) once I've already instantiated an instance of that class and caused it to run.
Is there any way around this? It feels to me like we should be able to dynamically handle this
The registration is most important in the "server" process where the callable will actually get called. Registering a callable in a "client" process only adds that typeid (the string you pass to register) as a method to the manager class. The rub is that running the server blocks, preventing you from registering new callables, and it occurs in another process making it further difficult to modify the registry.
I've been tinkering with this a little while... imao managers are cursed.. I think your prior question would also be answered (aside from our discussion in the comments) by the thing that solved it. Basically python attempts to be a little bit secure about not sending around the authkey parameter for proxied objects, but it stumbles sometimes (particularly with nested proxies). The fix is to set the default authkey for the process mp.current_process().authkey = b'abracadabra' which is used as the fallback when authkey=None (https://bugs.python.org/issue7503)
Here's my full testing script which is derived from the remote manager example from the docs. Basically I create a shared dict to hold shared queues:
#server process
from multiprocessing.managers import BaseManager, DictProxy
from multiprocessing import current_process
from queue import Queue
queues = {} #dict[uuid, Queue]
class QueueManager(BaseManager):
pass
QueueManager.register('new_queue', callable=Queue)
QueueManager.register('get_queues', callable=lambda:queues, proxytype=DictProxy)
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
current_process().authkey = b'abracadabra'
s = m.get_server()
s.serve_forever()
#process A
from multiprocessing.managers import BaseManager
from multiprocessing import current_process
class QueueManager(BaseManager):
pass
QueueManager.register('new_queue')
QueueManager.register('get_queues')
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
m.connect()
current_process().authkey = b'abracadabra'
queues_dict = m.get_queues()
queues_dict['my_uuid'] = m.new_queue()
queues_dict['my_uuid'].put("this is a test")
#process B
from multiprocessing.managers import BaseManager
from multiprocessing import current_process
class QueueManager(BaseManager):
pass
QueueManager.register('new_queue')
QueueManager.register('get_queues')
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
m.connect()
current_process().authkey = b'abracadabra'
queues_dict = m.get_queues()
print(queues_dict['my_uuid'].get())
EDIT:
Regarding the comments: "get_queue take the UUID and return the specific queue" the modification is simple, and does not involve nested proxies thereby avoiding the digest auth issue:
#server process
from multiprocessing.managers import BaseManager
from collections import defaultdict
from queue import Queue
queues = defaultdict(Queue)
class QueueManager(BaseManager): pass
QueueManager.register('get_queue', callable=lambda uuid:queues[uuid])
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
s = m.get_server()
s.serve_forever()
#process A
from multiprocessing.managers import BaseManager
class QueueManager(BaseManager): pass
QueueManager.register('get_queue')
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
m.connect()
m.get_queue("my_uuid").put("this is a test")
#process B
from multiprocessing.managers import BaseManager
class QueueManager(BaseManager): pass
QueueManager.register('get_queue')
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
m.connect()
print(m.get_queue("my_uuid").get())
Aaron's answer is perhaps the simplest way here, where you share a dictionary and store the queues in that shared dictionary. However, it does not answer the problem of not being able to update the methods on a manager once it has started. Therefore, here is a more complete solution, less verbose than it's alternative, where you can update the registry even after the server has started:
from queue import Queue
from multiprocessing.managers import SyncManager, Server, State, dispatch
from multiprocessing.context import ProcessError
class UpdateServer(Server):
public = ['shutdown', 'create', 'accept_connection', 'get_methods',
'debug_info', 'number_of_objects', 'dummy', 'incref', 'decref', 'update_registry']
def update_registry(self, c, registry):
with self.mutex:
self.registry.update(registry)
def get_server(self):
if self._state.value != State.INITIAL:
if self._state.value == State.STARTED:
raise ProcessError("Already started server")
elif self._state.value == State.SHUTDOWN:
raise ProcessError("Manager has shut down")
else:
raise ProcessError(
"Unknown state {!r}".format(self._state.value))
return self._Server(self._registry, self._address,
self._authkey, self._serializer)
class UpdateManager(SyncManager):
_Server = UpdateServer
def update_registry(self):
assert self._state.value == State.STARTED, 'server not yet started'
conn = self._Client(self._address, authkey=self._authkey)
try:
dispatch(conn, None, 'update_registry', (type(self)._registry, ), {})
finally:
conn.close()
class MyQueue:
def __init__(self):
self.initialized = False
self.q = None
def initialize(self):
self.q = Queue()
def __call__(self):
if not self.initialized:
self.initialize()
self.initialized = True
return self.q
if __name__ == '__main__':
# Create an object of wrapper class, note that we do not initialize the queue right away (it's unpicklable)
queue = MyQueue()
manager = UpdateManager()
manager.start()
# If you register new typeids, then call update_registry. The method_to_typeid parameter maps the __call__ method to
# return a proxy of the queue instead since Queues are not picklable
UpdateManager.register('uuid', queue, method_to_typeid={'__call__': 'Queue'})
manager.update_registry()
# Once the object is stored in the manager process, now we can safely initialize the queue and share
# it among processes. Initialization is implicit when we call uuid() if it's not already initialized
q = manager.uuid()
q.put('bye')
print(q.get())
Over here, UpdateServer and UpdateManager add support for method update_registry which informs the server if any new typeid's are registered with the manager. MyQueue is simply a wrapper class to return the new queues registered if called directly. While it's functionally similar to registering lambda : queue, the wrapper is necessary because lamdba functions are not picklable and the server process is being started in a new process here (rather than doing server.serve_forever() in the main process, but you can do that too if you want).
So, you can now register typeids even after the manager process is running, just make sure to call the update_registry function right after. This function call will even work if you are starting the server in the main process itself (by using serve_forever, like in Aaron's answer) and connecting to it from another process using manager.connect.
I have implemented websocket in Django app using Django-channels, now the front-end send some data through the websocket and i want the current running celery task to be able to read it. I tried creating shared memory static object, but not working.
SimulationInputs.add(simulation_id=simulation.id, init_data=init_inputs)
return InteractiveSimulationTask.delay_or_fail(
simulation_id=simulation.id
)
class SimulationData:
data = ''
class SimulationInputs:
data = None
#classmethod
def init_manager(cls, manager):
manager = Manager()
cls.data = manager.dict()
#classmethod
def add(cls, simulation_id, init_data):
cls.data[simulation_id] = init_data
#classmethod
def write(cls, simulation_id, simulation_data):
if cls.data.get(simulation_id):
cls.data[simulation_id] = simulation_data
#classmethod
def read(cls, simulation_id, simulation_data):
simulation_data.data = cls.data.get(simulation_id)
# manage.y
if __name__ == "__main__":
SimulationInputs.init_manager()
class InteractiveSimulationTask(JobtasticTask):
def calculate_result(self, simulation_id, **kwargs):
while True:
SimulationInputs.read(simulation_id=self.simulation.id, simulation_data=simulation_data)
The task always throw exception cls.data.get(simulation_id): NoneObjectType has no method get
I need to share data between the celery task and the main process.
Any hint?
Since you're using celery, you probably have redis or some other memory-store available. Consider using this as your indirection layer, i.e. the read and write methods use the simulation_id as a key to the simulation data
I believe the issue you're facing is due to the lifecycle of the python class. In init_manager when you assign to cls.data you're overwriting the class's property, not the instance's property. This doesn't do what you want it to, as evidenced by the error message: cls.data is going to be None.
What I think you're going for is the "Singleton Pattern". You want to have one and only SimulationInputs object which can read/write the data for each ID. This discussion can help you with implementing a singleton in python
I come up to conclusion that Django and celery should not share the memory, because they are on diff. process and they are diff programs, so they should communicate through socket or messaging system. I solved my problem by using redis Pub/Sub https://redis.io/topics/pubsub.
I am following a preceding question here: how to add more items to a multiprocessing queue while script in motion
the code I am working with now:
import multiprocessing
class MyFancyClass:
def __init__(self, name):
self.name = name
def do_something(self):
proc_name = multiprocessing.current_process().name
print('Doing something fancy in {} for {}!'.format(proc_name, self.name))
def worker(q):
while True:
obj = q.get()
if obj is None:
break
obj.do_something()
if __name__ == '__main__':
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
queue.put(MyFancyClass('Fancy Dan'))
queue.put(MyFancyClass('Frankie'))
# print(queue.qsize())
queue.put(None)
# Wait for the worker to finish
queue.close()
queue.join_thread()
p.join()
Right now, there's two items in the queue. if I replace the two lines with a list of, say 50 items....How do I initiate a POOL to allow a number of processes available. for example:
p = multiprocessing.Pool(processes=4)
where does that go? I'd like to be able run multiple items at once, especially if the items run for a bit.
Thanks!
As a rule, you either use Pool or Process(es) plus Queues. Mixing both is a misuse; the Pool already uses Queues (or a similar mechanism) behind the scenes.
If you want to do this with a Pool, change your code to (moving code to main function for performance and better resource cleanup than running in global scope):
def main():
myfancyclasses = [MyFancyClass('Fancy Dan'), ...] # define your MyFancyClass instances here
with multiprocessing.Pool(processes=4) as p:
# Submit all the work
futures = [p.apply_async(fancy.do_something) for fancy in myfancyclasses]
# Done submitting, let workers exit as they run out of work
p.close()
# Wait until all the work is finished
for f in futures:
f.wait()
if __name__ == '__main__':
main()
This could be simplified further at the expense of purity, with the .*map* methods of Pool, e.g. to minimize memory usage redefine main as:
def main():
myfancyclasses = [MyFancyClass('Fancy Dan'), ...] # define your MyFancyClass instances here
with multiprocessing.Pool(processes=4) as p:
# No return value, so we ignore it, but we need to run out the result
# or the work won't be done
for _ in p.imap_unordered(MyFancyClass.do_something, myfancyclasses):
pass
Yes, technically either approach has a slightly higher overhead in terms of needing to serialize the return value you're not using so give it back to the parent process. But in practice, this cost is pretty low (since your function has no return, it's returning None, which serializes to almost nothing). An advantage to this approach is that for printing to the screen, you generally don't want to do it from the child processes (since they'll end up interleaving output), and you can replace the printing with returns to let the parent do the work, e.g.:
import multiprocessing
class MyFancyClass:
def __init__(self, name):
self.name = name
def do_something(self):
proc_name = multiprocessing.current_process().name
# Changed from print to return
return 'Doing something fancy in {} for {}!'.format(proc_name, self.name)
def main():
myfancyclasses = [MyFancyClass('Fancy Dan'), ...] # define your MyFancyClass instances here
with multiprocessing.Pool(processes=4) as p:
# Using the return value now to avoid interleaved output
for res in p.imap_unordered(MyFancyClass.do_something, myfancyclasses):
print(res)
if __name__ == '__main__':
main()
Note how all of these solutions remove the need to write your own worker function, or manually manage Queues, because Pools do that grunt work for you.
Alternate approach using concurrent.futures to efficiently process results as they become available, while allowing you to choose to submit new work (either based on the results, or based on external information) as you go:
import concurrent.futures
from concurrent.futures import FIRST_COMPLETED
def main():
allow_new_work = True # Set to False to indicate we'll no longer allow new work
myfancyclasses = [MyFancyClass('Fancy Dan'), ...] # define your initial MyFancyClass instances here
with concurrent.futures.ProcessPoolExecutor() as executor:
remaining_futures = {executor.submit(fancy.do_something)
for fancy in myfancyclasses}
while remaining_futures:
done, remaining_futures = concurrent.futures.wait(remaining_futures,
return_when=FIRST_COMPLETED)
for fut in done:
result = fut.result()
# Do stuff with result, maybe submit new work in response
if allow_new_work:
if should_stop_checking_for_new_work():
allow_new_work = False
# Let the workers exit when all remaining tasks done,
# and reject submitting more work from now on
executor.shutdown(wait=False)
elif has_more_work():
# Assumed to return collection of new MyFancyClass instances
new_fanciness = get_more_fanciness()
remaining_futures |= {executor.submit(fancy.do_something)
for fancy in new_fanciness}
myfancyclasses.extend(new_fanciness)
I'm having the following Problem. I want to implement a web crawler, so far this worked but it was so slow, that I tried to use multiprocessing for fetching the URLs.
Unfortunately I'm not very experienced at this field.
After some reading the easiest way seemed to me to use the map method from multiprocessing.pool for this.
But I constantly get the following error:
TypeError: Pickling an AuthenticationString object is disallowed for security reasons
I found very few cases with the same error and they unfortunately did not help me.
I created a stripped version of my code which can reproduce the error:
import multiprocessing
class TestCrawler:
def __init__(self):
self.m = multiprocessing.Manager()
self.queue = self.m.Queue()
for i in range(50):
self.queue.put(str(i))
self.pool = multiprocessing.Pool(6)
def mainloop(self):
self.process_next_url(self.queue)
while True:
self.pool.map(self.process_next_url, (self.queue,))
def process_next_url(self, queue):
url = queue.get()
print(url)
c = TestCrawler()
c.mainloop()
I would be very thankful about any help or suggestion!
Question: But I constantly get the following error:
The Error you'r getting is missleading, the reason are
self.queue = self.m.Queue()
Move the Queue instantiation Outside the class TestCrawler.
This leads to another Error:
NotImplementedError: pool objects cannot be passed between processes or pickled
The reason are:
self.pool = multiprocessing.Pool(6)
Both Errors are indicating that pickle can't find the class Members.
Note: Endless Loop!
Your following while Loop leads to a Endless Loop!
This will overload your System!
Furthermore, your pool.map(... starts only one Process with one Task!
while True:
self.pool.map(self.process_next_url, (self.queue,))
I suggest reading The Examples that demonstrates the use of a pool
Change to the following:
class TestCrawler:
def __init__(self, tasks):
# Assign the Global task to class member
self.queue = tasks
for i in range(50):
self.queue.put(str(i))
def mainloop(self):
# Instantiate the pool local
pool = mp.Pool(6)
for n in range(50):
# .map requires a Parameter pass None
pool.map(self.process_next_url, (None,))
# None is passed
def process_next_url(self, dummy):
url = self.queue.get()
print(url)
if __name__ == "__main__":
# Create the Queue as Global
tasks = mp.Manager().Queue()
# Pass the Queue to your class TestCrawler
c = TestCrawler(tasks)
c.mainloop()
This Example starts 5 Processes each processing 10 Tasks(urls):
class TestCrawler2:
def __init__(self, tasks):
self.tasks = tasks
def start(self):
pool = mp.Pool(5)
pool.map(self.process_url, self.tasks)
def process_url(self, url):
print('self.process_url({})'.format(url))
if __name__ == "__main__":
tasks = ['url{}'.format(n) for n in range(50)]
TestCrawler2(tasks).start()
Tested with Python: 3.4.2
I create Singleton class using Metaclass, it working good in multithreadeds and create only one instance of MySingleton class but in multiprocessing, it creates always new instance
import multiprocessing
class SingletonType(type):
# meta class for making a class singleton
def __call__(cls, *args, **kwargs):
try:
return cls.__instance
except AttributeError:
cls.__instance = super(SingletonType, cls).__call__(*args, **kwargs)
return cls.__instance
class MySingleton(object):
# singleton class
__metaclass__ = SingletonType
def __init__(*args,**kwargs):
print "init called"
def task():
# create singleton class instance
a = MySingleton()
# create two process
pro_1 = multiprocessing.Process(target=task)
pro_2 = multiprocessing.Process(target=task)
# start process
pro_1.start()
pro_2.start()
My output:
init called
init called
I need MySingleton class init method get called only once
Each of your child processes runs its own instance of the Python interpreter, hence the SingletonType in one process doesn't share its state with those in another process. This means that a true singleton that only exists in one of your processes will be of little use, because you won't be able to use it in the other processes: while you can manually share data between processes, that is limited to only basic data types (for example dicts and lists).
Instead of relying on singletons, simply share the underlying data between the processes:
#!/usr/bin/env python3
import multiprocessing
import os
def log(s):
print('{}: {}'.format(os.getpid(), s))
class PseudoSingleton(object):
def __init__(*args,**kwargs):
if not shared_state:
log('Initializating shared state')
with shared_state_lock:
shared_state['x'] = 1
shared_state['y'] = 2
log('Shared state initialized')
else:
log('Shared state was already initalized: {}'.format(shared_state))
def task():
a = PseudoSingleton()
if __name__ == '__main__':
# We need the __main__ guard so that this part is only executed in
# the parent
log('Communication setup')
shared_state = multiprocessing.Manager().dict()
shared_state_lock = multiprocessing.Lock()
# create two process
log('Start child processes')
pro_1 = multiprocessing.Process(target=task)
pro_2 = multiprocessing.Process(target=task)
pro_1.start()
pro_2.start()
# Wait until processes have finished
# See https://stackoverflow.com/a/25456494/857390
log('Wait for children')
pro_1.join()
pro_2.join()
log('Done')
This prints
16194: Communication setup
16194: Start child processes
16194: Wait for children
16200: Initializating shared state
16200: Shared state initialized
16201: Shared state was already initalized: {'x': 1, 'y': 2}
16194: Done
However, depending on your problem setting there might be better solutions using other mechanisms of inter-process communication. For example, the Queue class is often very useful.