How does one pass semaphore locks and shared objects into threads? - python

How does one pass semaphore locks into threading.Thread objects in Python?
I am using locks to allow different threading.Thread classes to modify shared resources in their run functions. However, I'm unsure how to pass in the lock which is declared in a "main" file while the two threading objects are defined in separate files. For example:
Let's say I have one class producing data in producer.py:
import threading
class producer(threading.Thread):
def __init__(self, lock, shared):
self.lock = lock # Must be linked with global_lock
self.shared = shared # Must also be linked with global_shared
self.sock = # Declare listening socket
def run(self):
self.lock.acquire()
self.shared = sock.recv_msg() # Get data from socket
self.lock.release()
And another class consuming data in consumer.py:
import threading
class consumer(threading.Thread):
def __init__(self, lock, shared):
self.lock = lock # Must be linked with global_lock
self.shared = shared # Must also be linked with global_shared
def run(self):
self.lock.acquire()
print self.shared
self.shared = None
self.lock.release()
Then, in main.py:
import threading
global_lock = threading.Lock()
shared_dictionary = {}
producer_thread = producer(global_lock, global_shared)
consumer_thread = consumer(global_lock, global_shared)
producer.start()
consumer.start()
I understand that I could use global variables and declare them at the beginning of the run functions in each class but this is undesired since the purpose of these classes are to be reusable.
How can I pass a shared lock between two threading.Thread objects such that their .run() functions use the shared lock and shared object. Please help. Thank you in advance!

Related

Can I dynamically register objects to proxy with a multiprocessing BaseManager?

There are plenty of examples of using a multiprocessing BaseManager-derived class to register a method for returning a queue handle proxy, that clients can then use to pull/put from the queue.
This is great, but I have a different scenario - what if the number of queues that I need to proxy changes in response to outside events? What I really want is to proxy a method that returns a specific queue given a UID.
I tried this out but I couldn't get it to work, it appears that the only things that are available are what is registered with the class before the object is instantiated. I'm unable to BaseManager.register("my-new-queue", lambda: queue.Queue) once I've already instantiated an instance of that class and caused it to run.
Is there any way around this? It feels to me like we should be able to dynamically handle this
The registration is most important in the "server" process where the callable will actually get called. Registering a callable in a "client" process only adds that typeid (the string you pass to register) as a method to the manager class. The rub is that running the server blocks, preventing you from registering new callables, and it occurs in another process making it further difficult to modify the registry.
I've been tinkering with this a little while... imao managers are cursed.. I think your prior question would also be answered (aside from our discussion in the comments) by the thing that solved it. Basically python attempts to be a little bit secure about not sending around the authkey parameter for proxied objects, but it stumbles sometimes (particularly with nested proxies). The fix is to set the default authkey for the process mp.current_process().authkey = b'abracadabra' which is used as the fallback when authkey=None (https://bugs.python.org/issue7503)
Here's my full testing script which is derived from the remote manager example from the docs. Basically I create a shared dict to hold shared queues:
#server process
from multiprocessing.managers import BaseManager, DictProxy
from multiprocessing import current_process
from queue import Queue
queues = {} #dict[uuid, Queue]
class QueueManager(BaseManager):
pass
QueueManager.register('new_queue', callable=Queue)
QueueManager.register('get_queues', callable=lambda:queues, proxytype=DictProxy)
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
current_process().authkey = b'abracadabra'
s = m.get_server()
s.serve_forever()
#process A
from multiprocessing.managers import BaseManager
from multiprocessing import current_process
class QueueManager(BaseManager):
pass
QueueManager.register('new_queue')
QueueManager.register('get_queues')
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
m.connect()
current_process().authkey = b'abracadabra'
queues_dict = m.get_queues()
queues_dict['my_uuid'] = m.new_queue()
queues_dict['my_uuid'].put("this is a test")
#process B
from multiprocessing.managers import BaseManager
from multiprocessing import current_process
class QueueManager(BaseManager):
pass
QueueManager.register('new_queue')
QueueManager.register('get_queues')
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
m.connect()
current_process().authkey = b'abracadabra'
queues_dict = m.get_queues()
print(queues_dict['my_uuid'].get())
EDIT:
Regarding the comments: "get_queue take the UUID and return the specific queue" the modification is simple, and does not involve nested proxies thereby avoiding the digest auth issue:
#server process
from multiprocessing.managers import BaseManager
from collections import defaultdict
from queue import Queue
queues = defaultdict(Queue)
class QueueManager(BaseManager): pass
QueueManager.register('get_queue', callable=lambda uuid:queues[uuid])
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
s = m.get_server()
s.serve_forever()
#process A
from multiprocessing.managers import BaseManager
class QueueManager(BaseManager): pass
QueueManager.register('get_queue')
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
m.connect()
m.get_queue("my_uuid").put("this is a test")
#process B
from multiprocessing.managers import BaseManager
class QueueManager(BaseManager): pass
QueueManager.register('get_queue')
m = QueueManager(address=('localhost', 50000), authkey=b'abracadabra')
m.connect()
print(m.get_queue("my_uuid").get())
Aaron's answer is perhaps the simplest way here, where you share a dictionary and store the queues in that shared dictionary. However, it does not answer the problem of not being able to update the methods on a manager once it has started. Therefore, here is a more complete solution, less verbose than it's alternative, where you can update the registry even after the server has started:
from queue import Queue
from multiprocessing.managers import SyncManager, Server, State, dispatch
from multiprocessing.context import ProcessError
class UpdateServer(Server):
public = ['shutdown', 'create', 'accept_connection', 'get_methods',
'debug_info', 'number_of_objects', 'dummy', 'incref', 'decref', 'update_registry']
def update_registry(self, c, registry):
with self.mutex:
self.registry.update(registry)
def get_server(self):
if self._state.value != State.INITIAL:
if self._state.value == State.STARTED:
raise ProcessError("Already started server")
elif self._state.value == State.SHUTDOWN:
raise ProcessError("Manager has shut down")
else:
raise ProcessError(
"Unknown state {!r}".format(self._state.value))
return self._Server(self._registry, self._address,
self._authkey, self._serializer)
class UpdateManager(SyncManager):
_Server = UpdateServer
def update_registry(self):
assert self._state.value == State.STARTED, 'server not yet started'
conn = self._Client(self._address, authkey=self._authkey)
try:
dispatch(conn, None, 'update_registry', (type(self)._registry, ), {})
finally:
conn.close()
class MyQueue:
def __init__(self):
self.initialized = False
self.q = None
def initialize(self):
self.q = Queue()
def __call__(self):
if not self.initialized:
self.initialize()
self.initialized = True
return self.q
if __name__ == '__main__':
# Create an object of wrapper class, note that we do not initialize the queue right away (it's unpicklable)
queue = MyQueue()
manager = UpdateManager()
manager.start()
# If you register new typeids, then call update_registry. The method_to_typeid parameter maps the __call__ method to
# return a proxy of the queue instead since Queues are not picklable
UpdateManager.register('uuid', queue, method_to_typeid={'__call__': 'Queue'})
manager.update_registry()
# Once the object is stored in the manager process, now we can safely initialize the queue and share
# it among processes. Initialization is implicit when we call uuid() if it's not already initialized
q = manager.uuid()
q.put('bye')
print(q.get())
Over here, UpdateServer and UpdateManager add support for method update_registry which informs the server if any new typeid's are registered with the manager. MyQueue is simply a wrapper class to return the new queues registered if called directly. While it's functionally similar to registering lambda : queue, the wrapper is necessary because lamdba functions are not picklable and the server process is being started in a new process here (rather than doing server.serve_forever() in the main process, but you can do that too if you want).
So, you can now register typeids even after the manager process is running, just make sure to call the update_registry function right after. This function call will even work if you are starting the server in the main process itself (by using serve_forever, like in Aaron's answer) and connecting to it from another process using manager.connect.

Python subclassing multiprocessing.Lock

I'm trying to understand why python can not compile the following class.
class SharedResource(multiprocessing.Lock):
def __init__(self, blocking=True, timeout=-1):
# super().__init__(blocking=True, timeout=-1)
self.blocking = blocking
self.timeout = timeout
self.data = {}
TypeError: method expected 2 arguments, got 3
The reason why I'm subclassing Lock
my objective is to create a shared list of resource that should be usable only by on process at a time.
this concept will be eventually in a Flash application where the request should not be able to use the resource concurrently
RuntimeError: Lock objects should only be shared between processes through inheritance
class SharedResource():
def __init__(self, id, model):
'''
id: mode id
model: Keras Model only one worker at a time can call predict
'''
self.mutex = Lock()
self.id = id
self.model = model
manager = Manager()
shared_list = manager.list() # a List of models
shared_list.append(SharedResource())
def worker1(l):
...read some data
while True:
resource = l[0]
with m:
resource['model'].predict(...some data)
time.sleep(60)
if __name__ == "__main__":
processes = [ Process(target=worker1, args=[shared_list])]
for p in processes:
p.start()
for p in processes:
p.join()
The reason you are getting this error is because multiprocessing.Lock is actually a function.
In .../multiprocessing/context.py there are these lines:
def Lock(self):
'''Returns a non-recursive lock object'''
from .synchronize import Lock
return Lock(ctx=self.get_context())
This may change in the future so you can verify this on your version of python by doing:
import multiprocessing
print(type(multiprocessing.Lock))
To actually subclass Lock you will need to do something like this:
from multiprocessing import synchronize
from multiprocessing.synchronize import Lock
# Since Lock is now a class, this should work:
class SharedResource(Lock):
pass
I'm not endorsing this approach as a "good" solution, but it should solve your problem if you really need to subclass Lock. Subclassing things that try to avoid being subclassed is usually not a great idea, but sometimes it can be necessary. If you can solve the problem in a different way you may want to consider that.

Pass a Queue to watchdog thread in python

I have a class Indexer which is instantiated from the main thread, the instance of this class is stored in a variable, say, indexer. watchdog.observers.Observer() watches directories for changes and these happen in another thread. I tried passing this indexer variable from main thread through my handler Vigilante which was passed to ob.schedule(Vigilante(indexer)) alongside some other variables from main thread. I can't access the indexer variable in the Vigilante class, because of being in different threads. I know I could use a Queue but I don't know how I'd pass the Queue to watchdog's thread.
Here is the code from main thread:
if watch:
import watchdog.observers
from .utils import vigilante
class callbacks:
def __init__(self):
pass
#staticmethod
def build(filename, response):
return _build(filename, response)
#staticmethod
def renderer(src, mode):
return render(src, mode)
handler = vigilante.Vigilante(_filter, ignore, Indexer, callbacks, Mode)
path_to_watch = os.path.normpath(os.path.join(workspace, '..'))
ob = watchdog.observers.Observer()
ob.schedule(handler, path=path_to_watch, recursive=True)
ob.start()
import time
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
ob.stop()
Indexer.close()
ob.join()
The Indexer class is meant to write to a database from another part of the code where the Indexer was instantiated.
Here is the code from Vigilante class running in watchdog's thread:
class Vigilante(PatternMatchingEventHandler):
"""Helps to watch files, directories for changes"""
def __init__(self, pattern, ignore, indexer, callback, mode):
pattern.append("*.yml")
self.Callback = callback
self.Mode = mode
self.Indexer = indexer
super(Vigilante, self).__init__(patterns=pattern, ignore_directories=ignore)
def vigil(self, event):
print(event.src_path, 'modified')
IndexReader = self.Indexer.get_index_on(event.src_path)
dep = IndexReader.read_index()
print(dep.next(), 'dependency')
feedout = self.Callback.build(
os.path.basename(event.src_path)
,self.Callback.renderer(event.src_path, self.Mode.FILE_MODE)
)
def on_modified(self, event):
self.vigil(event)
def on_created(self, event):
self.vigil(event)
All I need is a way to pass those variables from the main thread to watchdog's thread, through the Vigilante class
I finally found a way to do it without crossing threads too much as before, with an idea derived from #EvertW 's answer. I passed a Queue from main thread to the Vigilante class which was in another thread, so every modified file would be put in the Queue and then, from the main thread, I got the file modified from the queue, read from the Indexer database, and every other task which the Vigilante.vigil method needed to perform was moved to the main thread since those tasks depends on the modified file and what is read from the Indexer database.
This error disappeared:
SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 9788 and this is thread id 4288.
Here is a snippet from what I did:
....
q = Queue.LifoQueue(10)
handler = vigilante.Vigilante(q, _filter, ignore)
path_to_watch = os.path.normpath(os.path.join(workspace, '..'))
ob = watchdog.observers.Observer()
ob.schedule(handler, path=path_to_watch, recursive=True)
ob.start()
import time
try:
while True:
if not q.empty():
modified = q.get()
IndexReader = Indexer.get_index_on(modified)
deps = IndexReader.read_index()
print(deps.next(), 'dependency')
# TODO
else:
print('q is empty')
time.sleep(1)
except KeyboardInterrupt:
ob.stop()
Indexer.close()
ob.join()
Vigilante class:
class Vigilante(PatternMatchingEventHandler):
"""Helps to watch files, directories for changes"""
def __init__(self, q, pattern, ignore):
self.q = q
super(Vigilante, self).__init__(
patterns=pattern,
ignore_patterns=ignore,
ignore_directories=True
)
def vigil(self, event):
print(event.src_path, 'modified')
self.q.put(event.src_path)
def on_modified(self, event):
self.vigil(event)
def on_created(self, event):
self.vigil(event)
....
PS: A word of advice : my word of advice to whoever comes across this kind of problem with threading in watchdog is; "Don't trust the watchdog's thread to do tasks on modified files, just get the modified files out and do whatever you like with them, except the task is a simple one."
You could try the Observer pattern (no pun intended), i.e. let the Observer class have a list of listeners that it will inform of any changes it sees. Then let the indexer announce its interest to the Observer.
In my example, the Observer expects subscribers to be callables that receive the changes. Then you can do:
from queue import Queue
class Observable:
def __init__(self):
self.listeners = []
def subscribe(listener):
self.listeners.append(listener)
def onNotify(change):
for listener in self.listeners:
listener(change)
class Indexer(Thread):
def __init__(self, observer):
Thread.__init__(self)
self.q = Queue()
observer.subscribe(self.q.put)
def run(self):
while True:
change = self.q.get()
Because the standard Queue is completely thread-safe, this will work fine.

Make Singleton class in Multiprocessing

I create Singleton class using Metaclass, it working good in multithreadeds and create only one instance of MySingleton class but in multiprocessing, it creates always new instance
import multiprocessing
class SingletonType(type):
# meta class for making a class singleton
def __call__(cls, *args, **kwargs):
try:
return cls.__instance
except AttributeError:
cls.__instance = super(SingletonType, cls).__call__(*args, **kwargs)
return cls.__instance
class MySingleton(object):
# singleton class
__metaclass__ = SingletonType
def __init__(*args,**kwargs):
print "init called"
def task():
# create singleton class instance
a = MySingleton()
# create two process
pro_1 = multiprocessing.Process(target=task)
pro_2 = multiprocessing.Process(target=task)
# start process
pro_1.start()
pro_2.start()
My output:
init called
init called
I need MySingleton class init method get called only once
Each of your child processes runs its own instance of the Python interpreter, hence the SingletonType in one process doesn't share its state with those in another process. This means that a true singleton that only exists in one of your processes will be of little use, because you won't be able to use it in the other processes: while you can manually share data between processes, that is limited to only basic data types (for example dicts and lists).
Instead of relying on singletons, simply share the underlying data between the processes:
#!/usr/bin/env python3
import multiprocessing
import os
def log(s):
print('{}: {}'.format(os.getpid(), s))
class PseudoSingleton(object):
def __init__(*args,**kwargs):
if not shared_state:
log('Initializating shared state')
with shared_state_lock:
shared_state['x'] = 1
shared_state['y'] = 2
log('Shared state initialized')
else:
log('Shared state was already initalized: {}'.format(shared_state))
def task():
a = PseudoSingleton()
if __name__ == '__main__':
# We need the __main__ guard so that this part is only executed in
# the parent
log('Communication setup')
shared_state = multiprocessing.Manager().dict()
shared_state_lock = multiprocessing.Lock()
# create two process
log('Start child processes')
pro_1 = multiprocessing.Process(target=task)
pro_2 = multiprocessing.Process(target=task)
pro_1.start()
pro_2.start()
# Wait until processes have finished
# See https://stackoverflow.com/a/25456494/857390
log('Wait for children')
pro_1.join()
pro_2.join()
log('Done')
This prints
16194: Communication setup
16194: Start child processes
16194: Wait for children
16200: Initializating shared state
16200: Shared state initialized
16201: Shared state was already initalized: {'x': 1, 'y': 2}
16194: Done
However, depending on your problem setting there might be better solutions using other mechanisms of inter-process communication. For example, the Queue class is often very useful.

How to pass and run a callback method in Python

I have a Manager (main thread), that creates other Threads to handle various operations.
I would like my Manager to be notified when a Thread it created ends (when run() method execution is finished).
I know I could do it by checking the status of all my threads with the Thread.isActive() method, but polling sucks, so I wanted to have notifications.
I was thinking of giving a callback method to the Threads, and call this function at the end of the run() method:
class Manager():
...
MyThread(self.on_thread_finished).start() # How do I pass the callback
def on_thread_finished(self, data):
pass
...
class MyThread(Thread):
...
def run(self):
....
self.callback(data) # How do I call the callback?
...
Thanks!
The thread can't call the manager unless it has a reference to the manager. The easiest way for that to happen is for the manager to give it to the thread at instantiation.
class Manager(object):
def new_thread(self):
return MyThread(parent=self)
def on_thread_finished(self, thread, data):
print thread, data
class MyThread(Thread):
def __init__(self, parent=None):
self.parent = parent
super(MyThread, self).__init__()
def run(self):
# ...
self.parent and self.parent.on_thread_finished(self, 42)
mgr = Manager()
thread = mgr.new_thread()
thread.start()
If you want to be able to assign an arbitrary function or method as a callback, rather than storing a reference to the manager object, this becomes a bit problematic because of method wrappers and such. It's hard to design the callback so it gets a reference to both the manager and the thread, which is what you will want. I worked on that for a while and did not come up with anything I'd consider useful or elegant.
Anything wrong with doing it this way?
from threading import Thread
class Manager():
def Test(self):
MyThread(self.on_thread_finished).start()
def on_thread_finished(self, data):
print "on_thread_finished:", data
class MyThread(Thread):
def __init__(self, callback):
Thread.__init__(self)
self.callback = callback
def run(self):
data = "hello"
self.callback(data)
m = Manager()
m.Test() # prints "on_thread_finished: hello"
If you want the main thread to wait for children threads to finish execution, you are probably better off using some kind of synchronization mechanism. If simply being notified when one or more threads has finished executing, a Condition is enough:
import threading
class MyThread(threading.Thread):
def __init__(self, condition):
threading.Thread.__init__(self)
self.condition = condition
def run(self):
print "%s done" % threading.current_thread()
with self.condition:
self.condition.notify()
condition = threading.Condition()
condition.acquire()
thread = MyThread(condition)
thread.start()
condition.wait()
However, using a Queue is probably better, as it makes handling multiple worker threads a bit easier.

Categories

Resources