I have a class, that loads all resources into memory that are needed for my application (mostly images).
Then several threads need to access these resources through this class.
I don't want every instance to reload all resources, so I thought I use the Singleton Pattern.
I did it like this:
class DataContainer(object):
_instance = None
_lock = threading.Lock()
_initialised = True
def __new__(cls, *args, **kwargs):
with cls._lock:
if not cls._instance:
cls._initialised = False
cls._instance = object.__new__(cls, *args, **kwargs)
return cls._instance
def __init__(self, map_name = None):
# instance has already been created
if self._initialised:
return
self._initialised = True
# load images
This works fine, as long as I am not using multiple threads. But with multiple Threads every thread has a different instance. So using 4 threads, they each create a new instance.
I want all threads to use the same instance of this class,
so the resources are only loaded into memory once.
I also tried to do this in the same module where the class is defined, but outside the class definition:
def getDataContainer():
global dataContainer
return dataContainer
dataContainer = DataContainer()
but every thread still has its own instance.
I am new to python, if this is the wrong approach plz let me know,
I appreciate any help
To expand on #Will's comment, if a "shared object" is created by the parent, then passed in to each thread, all threads will share the same object.
(With processes, see the multiprocessing.Manager class, which directly support sharing state, including with modifications.)
import threading, time
class SharedObj(object):
image = 'beer.jpg'
class DoWork(threading.Thread):
def __init__(self, shared, *args, **kwargs):
super(DoWork,self).__init__(*args, **kwargs)
self.shared = shared
def run(self):
print threading.current_thread(), 'start'
time.sleep(1)
print 'shared', self.shared.image, id(self.shared)
print threading.current_thread(), 'done'
myshared = SharedObj()
threads = [ DoWork(shared=myshared, name='a'),
DoWork(shared=myshared, name='b')
]
for t in threads:
t.start()
for t in threads:
t.join()
print 'DONE'
Output:
<DoWork(a, started 140381090318080)> start
<DoWork(b, started 140381006067456)> start
shared beer.jpg shared140381110335440
<DoWork(b, started 140381006067456)> done
beer.jpg 140381110335440
<DoWork(a, started 140381090318080)> done
DONE
Note that the thread IDs are different, but they both use the same SharedObj instance, at memory address ending in 440.
Related
I have a large Python 3.6 system where multiple processes and threads interact with each other and the user. Simplified, there is a Scheduler instance (subclasses threading.Thread) and a Worker instance (subclasses multiprocessing.Process). Both objects run for the entire duration of the program.
The user interacts with the Scheduler by adding Task instances and the Scheduler passes the task to the Worker at the correct moment in time. The worker uses the information contained in the task to do its thing.
Below is some stripped out and simplified code out of the project:
class Task:
def __init__(self, name:str):
self.name = name
self.state = 'idle'
class Scheduler(threading.Thread):
def __init__(self, worker:Worker):
super().init()
self.worker = worker
self.start()
def run(self):
while True:
# Do stuff until the user schedules a new task
task = Task() # <-- In reality the Task intance is not created here but the thread gets it from elsewhere
task.state = 'scheduled'
self.worker.change_task(task)
# Do stuff until the task.state == 'finished'
class Worker(multiprocessing.Process):
def __init__(self):
super().init()
self.current_task = None
self.start()
def change_task(self, new_task:Task):
self.current_task = new_task
self.current_task.state = 'accepted-idle'
def run(self):
while True:
# Do stuff until the current task is updated
self.current_task.state = 'accepted-running'
# Task is running
self.current_task.state = 'finished'
The system used to be structured so that the task contained multiple multiprocessing.Events indicating each of its possible states. Then, not the whole Task instance was passed to the worker, but each of the task's attributes was. As they were all multiprocessing safe, it worked, with a caveat. The events changed in worker.run had to be created in worker.run and back passed to the task object for it work. Not only is this a less than ideal solution, it no longer works with some changes I am making to the project.
Back to the current state of the project, as described by the python code above. As is, this will never work because nothing makes this multiprocessing safe at the moment. So I implemented a Proxy/BaseManager structure so that when a new Task is needed, the system gets it from the multiprocessing manager. I use this structure in a sightly different way elsewhere in the project as well. The issue is that the worker.run never knows that the self.current_task is updated, it remains None. I expected this to be fixed by using the proxy but clearly I am mistaken.
def Proxy(target: typing.Type) -> typing.Type:
"""
Normally a Manager only exposes only object methods. A NamespaceProxy can be used when registering the object with
the manager to expose all the attributes. This also works for attributes created at runtime.
https://stackoverflow.com/a/68123850/8353475
1. Instead of exposing all the attributes manually, we effectively override __getattr__ to do it dynamically.
2. Instead of defining a class that subclasses NamespaceProxy for each specific object class that needs to be
proxied, this method is used to do it dynamically. The target parameter should be the class of the object you want
to generate the proxy for. The generated proxy class will be returned.
Example usage: FooProxy = Proxy(Foo)
:param target: The class of the object to build the proxy class for
:return The generated proxy class
"""
# __getattr__ is called when an attribute 'bar' is called from 'foo' and it is not found eg. 'foo.bar'. 'bar' can
# be a class method as well as a variable. The call gets rerouted from the base object to this proxy, were it is
# processed.
def __getattr__(self, key):
result = self._callmethod('__getattribute__', (key,))
# If attr call was for a method we need some further processing
if isinstance(result, types.MethodType):
# A wrapper around the method that passes the arguments, actually calls the method and returns the result.
# Note that at this point wrapper() does not get called, just defined.
def wrapper(*args, **kwargs):
# Call the method and pass the return value along
return self._callmethod(key, args, kwargs)
# Return the wrapper method (not the result, but the method itself)
return wrapper
else:
# If the attr call was for a variable it can be returned as is
return result
dic = {'types': types, '__getattr__': __getattr__}
proxy_name = target.__name__ + "Proxy"
ProxyType = type(proxy_name, (NamespaceProxy,), dic)
# This is a tuple of all the attributes that are/will be exposed. We copy all of them from the base class
ProxyType._exposed_ = tuple(dir(target))
return ProxyType
class TaskManager(BaseManager):
pass
TaskProxy = Proxy(Task)
TaskManager.register('get_task', callable=Task, proxytype=TaskProxy)
I'm trying to understand why python can not compile the following class.
class SharedResource(multiprocessing.Lock):
def __init__(self, blocking=True, timeout=-1):
# super().__init__(blocking=True, timeout=-1)
self.blocking = blocking
self.timeout = timeout
self.data = {}
TypeError: method expected 2 arguments, got 3
The reason why I'm subclassing Lock
my objective is to create a shared list of resource that should be usable only by on process at a time.
this concept will be eventually in a Flash application where the request should not be able to use the resource concurrently
RuntimeError: Lock objects should only be shared between processes through inheritance
class SharedResource():
def __init__(self, id, model):
'''
id: mode id
model: Keras Model only one worker at a time can call predict
'''
self.mutex = Lock()
self.id = id
self.model = model
manager = Manager()
shared_list = manager.list() # a List of models
shared_list.append(SharedResource())
def worker1(l):
...read some data
while True:
resource = l[0]
with m:
resource['model'].predict(...some data)
time.sleep(60)
if __name__ == "__main__":
processes = [ Process(target=worker1, args=[shared_list])]
for p in processes:
p.start()
for p in processes:
p.join()
The reason you are getting this error is because multiprocessing.Lock is actually a function.
In .../multiprocessing/context.py there are these lines:
def Lock(self):
'''Returns a non-recursive lock object'''
from .synchronize import Lock
return Lock(ctx=self.get_context())
This may change in the future so you can verify this on your version of python by doing:
import multiprocessing
print(type(multiprocessing.Lock))
To actually subclass Lock you will need to do something like this:
from multiprocessing import synchronize
from multiprocessing.synchronize import Lock
# Since Lock is now a class, this should work:
class SharedResource(Lock):
pass
I'm not endorsing this approach as a "good" solution, but it should solve your problem if you really need to subclass Lock. Subclassing things that try to avoid being subclassed is usually not a great idea, but sometimes it can be necessary. If you can solve the problem in a different way you may want to consider that.
I'd like to use Django as a relay to another layer. When the user makes a request, a thrift client calls a function that goes to another software to retrieve the result.
I understand the potential thread-safety issues, which is why there will be an object pool of clients based on Python's multiprocessing, and every user request will pop one, use it, and put it back.
The object pool has to be initialized at the beginning of the program, and destroyed (hopefully cleanly) when the program exits (when the server is interrupted with SIGINT). The motivation to this question is to know how to do this in the correct way without hacks.
The only way I found so far is the following prototype with global variables, given that the following is the views file with function based views:
clients_pool = ClientsPool("127.0.0.1", 9090, size=64) # pool size
def get_data(request):
global clients_pool
c = clients_pool.pop() # borrow a client from the pool
res = c.get_data(request.body) # call the remote function with thrift
clients_pool.release(c) # restore client to the pool
return HttpResponse(res)
def get_something(request):
global clients_pool
c = clients_pool.pop() # borrow a client from the pool
res = c.get_something(request.body) # call the remote function with thrift
clients_pool.release(c) # restore client to the pool
return HttpResponse(res)
Is this the correct way to do this? Or does Django offer a better mechanism to do this?
I would make sure multiple goals are satisfied here. First make sure always only one instance of your pool manager is created. You could place it in your app's __init__.py. I don't know if this threadsafe. I doubt it.
Second you want to delegate the management of the pool to the application layer. Right now you are managing the pool in the view layer. You always want to make sure pool clients are released I suppose. This would lead to something like this:
# [app_name]/__init__.py
class Singleton(type):
_instances = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
return cls._instances[cls]
class PoolManager(metaclass=Singleton):
def __init__(self, *args, **kwargs)
self.pool = ClientsPool(*args, **kwargs)
pool_manager = PoolManager(host='127.0.0.1', port=9090, size=64)
# views.py / consider moving the context manager to seperate file
from [app_name] import pool_manager
class PoolContextManager:
def __init__(self):
self.pool = pool_manager
def __enter__(self):
return self.pool.pop()
def __exit__(self, type, value, traceback):
self.pool.release(c)
def get_data(request):
with PoolContextManager() as pm:
res = pm.get_data(request.body)
return HttpResponse(res)
def get_something(request):
with PoolContextManager() as pm:
res = pm.get_something(request.body)
return HttpResponse(res)
I create Singleton class using Metaclass, it working good in multithreadeds and create only one instance of MySingleton class but in multiprocessing, it creates always new instance
import multiprocessing
class SingletonType(type):
# meta class for making a class singleton
def __call__(cls, *args, **kwargs):
try:
return cls.__instance
except AttributeError:
cls.__instance = super(SingletonType, cls).__call__(*args, **kwargs)
return cls.__instance
class MySingleton(object):
# singleton class
__metaclass__ = SingletonType
def __init__(*args,**kwargs):
print "init called"
def task():
# create singleton class instance
a = MySingleton()
# create two process
pro_1 = multiprocessing.Process(target=task)
pro_2 = multiprocessing.Process(target=task)
# start process
pro_1.start()
pro_2.start()
My output:
init called
init called
I need MySingleton class init method get called only once
Each of your child processes runs its own instance of the Python interpreter, hence the SingletonType in one process doesn't share its state with those in another process. This means that a true singleton that only exists in one of your processes will be of little use, because you won't be able to use it in the other processes: while you can manually share data between processes, that is limited to only basic data types (for example dicts and lists).
Instead of relying on singletons, simply share the underlying data between the processes:
#!/usr/bin/env python3
import multiprocessing
import os
def log(s):
print('{}: {}'.format(os.getpid(), s))
class PseudoSingleton(object):
def __init__(*args,**kwargs):
if not shared_state:
log('Initializating shared state')
with shared_state_lock:
shared_state['x'] = 1
shared_state['y'] = 2
log('Shared state initialized')
else:
log('Shared state was already initalized: {}'.format(shared_state))
def task():
a = PseudoSingleton()
if __name__ == '__main__':
# We need the __main__ guard so that this part is only executed in
# the parent
log('Communication setup')
shared_state = multiprocessing.Manager().dict()
shared_state_lock = multiprocessing.Lock()
# create two process
log('Start child processes')
pro_1 = multiprocessing.Process(target=task)
pro_2 = multiprocessing.Process(target=task)
pro_1.start()
pro_2.start()
# Wait until processes have finished
# See https://stackoverflow.com/a/25456494/857390
log('Wait for children')
pro_1.join()
pro_2.join()
log('Done')
This prints
16194: Communication setup
16194: Start child processes
16194: Wait for children
16200: Initializating shared state
16200: Shared state initialized
16201: Shared state was already initalized: {'x': 1, 'y': 2}
16194: Done
However, depending on your problem setting there might be better solutions using other mechanisms of inter-process communication. For example, the Queue class is often very useful.
I have a Manager (main thread), that creates other Threads to handle various operations.
I would like my Manager to be notified when a Thread it created ends (when run() method execution is finished).
I know I could do it by checking the status of all my threads with the Thread.isActive() method, but polling sucks, so I wanted to have notifications.
I was thinking of giving a callback method to the Threads, and call this function at the end of the run() method:
class Manager():
...
MyThread(self.on_thread_finished).start() # How do I pass the callback
def on_thread_finished(self, data):
pass
...
class MyThread(Thread):
...
def run(self):
....
self.callback(data) # How do I call the callback?
...
Thanks!
The thread can't call the manager unless it has a reference to the manager. The easiest way for that to happen is for the manager to give it to the thread at instantiation.
class Manager(object):
def new_thread(self):
return MyThread(parent=self)
def on_thread_finished(self, thread, data):
print thread, data
class MyThread(Thread):
def __init__(self, parent=None):
self.parent = parent
super(MyThread, self).__init__()
def run(self):
# ...
self.parent and self.parent.on_thread_finished(self, 42)
mgr = Manager()
thread = mgr.new_thread()
thread.start()
If you want to be able to assign an arbitrary function or method as a callback, rather than storing a reference to the manager object, this becomes a bit problematic because of method wrappers and such. It's hard to design the callback so it gets a reference to both the manager and the thread, which is what you will want. I worked on that for a while and did not come up with anything I'd consider useful or elegant.
Anything wrong with doing it this way?
from threading import Thread
class Manager():
def Test(self):
MyThread(self.on_thread_finished).start()
def on_thread_finished(self, data):
print "on_thread_finished:", data
class MyThread(Thread):
def __init__(self, callback):
Thread.__init__(self)
self.callback = callback
def run(self):
data = "hello"
self.callback(data)
m = Manager()
m.Test() # prints "on_thread_finished: hello"
If you want the main thread to wait for children threads to finish execution, you are probably better off using some kind of synchronization mechanism. If simply being notified when one or more threads has finished executing, a Condition is enough:
import threading
class MyThread(threading.Thread):
def __init__(self, condition):
threading.Thread.__init__(self)
self.condition = condition
def run(self):
print "%s done" % threading.current_thread()
with self.condition:
self.condition.notify()
condition = threading.Condition()
condition.acquire()
thread = MyThread(condition)
thread.start()
condition.wait()
However, using a Queue is probably better, as it makes handling multiple worker threads a bit easier.