I'm trying to make a memoize decorator that works with multiple threads.
I understood that I need to use the cache as a shared object between the threads, and acquire/lock the shared object. I'm of course launching the threads:
for i in range(5):
thread = threading.Thread(target=self.worker, args=(self.call_queue,))
thread.daemon = True
thread.start()
where worker is:
def worker(self, call):
func, args, kwargs = call.get()
self.returns.put(func(*args, **kwargs))
call.task_done()
The problem starts, of course, when I'm sending a function decorated with a memo function (like this) to many threads at the same time.
How can I implement the memo's cache as a shared object among threads?
The most straightforward way is to employ a single lock for the entire cache, and require that any writes to the cache grab the lock first.
In the example code you posted, at line 31, you would acquire the lock and check to see if the result is still missing, in which case you would go ahead and compute and cache the result. Something like this:
lock = threading.Lock()
...
except KeyError:
with lock:
if key in self.cache:
v = self.cache[key]
else:
v = self.cache[key] = f(*args,**kwargs),time.time()
The example you posted stores a cache per function in a dictionary, so you'd need to store a lock per function as well.
If you were using this code in a highly contentious environment, though, it would probably be unacceptably inefficient, since threads would have to wait on each other even if they weren't calculating the same thing. You could probably improve on this by storing a lock per key in your cache. You'll need to globally lock access to the lock storage as well, though, or else there's a race condition in creating the per-key locks.
Related
In the documentation, Manager is used with a context manager (i.e. with) like so:
from multiprocessing.managers import BaseManager
class MathsClass:
def add(self, x, y):
return x + y
def mul(self, x, y):
return x * y
class MyManager(BaseManager):
pass
MyManager.register('Maths', MathsClass)
if __name__ == '__main__':
with MyManager() as manager:
maths = manager.Maths()
print(maths.add(4, 3)) # prints 7
print(maths.mul(7, 8)) # prints 56
But what is the benefit of this, with the exception of the namespace? For opening file streams, the benefit is quite obvious in that you don't have to manually .close() the connection, but what is it for Manager? If you don't use it in a context, what steps do you have to use to ensure that everything is closed properly?
In short, what is the benefit of using the above over something like:
manager = MyManager()
maths = manager.Maths()
print(maths.add(4, 3)) # prints 7
print(maths.mul(7, 8)) # prints 56
But what is the benefit of this (...)?
First, you get the primary benefit of almost any context managers. You have a well-defined lifetime for the resource. It is allocated and acquired when the with ...: block is opened. It is released when the blocks ends (either by reaching the end or because an exception is raised). It is still deallocated whenever the garbage collector gets around to it but this is of less concern since the external resource has already been released.
In the case of multiprocessing.Manager (which is a function that returns a SyncManager, even though Manager looks lot like a class), the resource is a "server" process that holds state and a number of worker processes that share that state.
what is [the benefit of using a context manager] for Manager?
If you don't use a context manager and you don't call shutdown on the manager then the "server" process will continue running until the SyncManager's __del__ is run. In some cases, this might happen soon after the code that created the SyncManager is done (for example, if it is created inside a short function and the function returns normally and you're using CPython then the reference counting system will probably quickly notice the object is dead and call its __del__). In other cases, it might take longer (if an exception is raised and holds on to a reference to the manager then it will be kept alive until that exception is dealt with). In some bad cases, it might never happen at all (if SyncManager ends up in a reference cycle then its __del__ will prevent the cycle collector from collecting it at all; or your process might crash before __del__ is called). In all these cases, you're giving up control of when the extra Python processes created by SyncManager are cleaned up. These processes may represent non-trivial resource usage on your system. In really bad cases, if you create SyncManager in a loop, you may end up creating many of these that live at the same time and could easily consume huge quantities of resources.
If you don't use it in a context, what steps do you have to use to ensure that everything is closed properly?
You have to implement the context manager protocol yourself, as you would for any context manager you used without with. It's tricky to do in pure-Python while still being correct. Something like:
manager = None
try:
manager = MyManager()
manager.__enter__()
# use it ...
except:
if manager is not None:
manager.__exit__(*exc_info())
else:
if manager is not None:
manager.__exit__(None, None, None)
start and shutdown are also aliases of __enter__ and __exit__, respectively.
According to a number of sources, including this question, passing a runnable as the target parameter in __init__ (with or without args and kwargs) is preferable to extending the Thread class.
If I create a runnable, how can I pass the thread it is running on as self to it without extending the Thread class? For example, the following would work fine:
class MyTask(Thread):
def run(self):
print(self.name)
MyTask().start()
However, I can't see a good way to get this version to work:
def my_task(t):
print(t.name)
Thread(target=my_task, args=(), kwargs={}).start()
This question is a followup to Python - How can I implement a 'stoppable' thread?, which I answered, but possibly incompletely.
Update
I've thought of a hack to do this using current_thread():
def my_task():
print(current_thread().name)
Thread(target=my_task).start()
Problem: calling a function to get a parameter that should ideally be passed in.
Update #2
I have found an even hackier solution that makes current_thread seem much more attractive:
class ThreadWithSelf(Thread):
def __init__(self, **kwargs):
args = kwargs.get('args', ())
args = (self,) + tuple(args)
kwargs[args] = args
super().__init__(**kwargs)
ThreadWithSelf(target=my_task).start()
Besides being incredibly ugly (e.g. by forcing the user to use keywords only, even if that is the recommended way in the documentation), this completely defeats the purpose of not extending Thread.
Update #3
Another ridiculous (and unsafe) solution: to pass in a mutable object via args and to update it afterwards:
def my_task(t):
print(t[0].name)
container = []
t = Thread(target=my_task, args=(container,))
container[0] = t
t.start()
To avoid synchronization issues, you could kick it up a notch and implement another layer of ridiculousness:
def my_task(t, i):
print(t[i].name)
container = []
container[0] = Thread(target=my_task, args=(container, 0))
container[1] = Thread(target=my_task, args=(container, 1))
for t in container:
t.start()
I am still looking for a legitimate answer.
It seems like your goal is to get access to the thread currently executing a task from within the task itself. You can't add the thread as an argument to the threading.Thread constructor, because it's not yet constructed. I think there are two real options.
If your task runs many times, potentially on many different threads, I think the best option is to use threading.current_thread() from within the task. This gives you access directly to the thread object, with which you can do whatever you want. This seems to be exactly the kind of use-case this function was designed for.
On the other hand, if your goal is implement a thread with some special characteristics, the natural choice is to subclass threading.Thread, implementing whatever extended behavior you wish.
Also, as you noted in your comment, insinstance(current_thread(), ThreadSubclass) will return True, meaning you can use both options and be assured that your task will have access to whatever extra behavior you've implemented in your subclass.
The simplest and most readable answer here is: use current_thread(). You can use various weird ways to pass the thread as a parameter, but there's no good reason for that. Calling current_thread() is the standard approach which is shorter than all the alternatives and will not confuse other developers. Don't try to overthink/overengineer this:
def runnable():
thread = threading.current_thread()
Thread(target=runnable).start()
If you want to hide it a bit and change for aesthetic reasons, you can try:
def with_current_thread(f):
return f(threading.current_thread())
#with_current_thread
def runnable(thread):
print(thread.name)
Thread(target=runnable).start()
If this is not good enough, you may get better answers by describing why you think the parameter passing is better / more correct for your use case.
Here is a simple secinaro:
class Test:
def __init__(self):
self.foo = []
def append(self, x):
self.foo.append(x)
def get(self):
return self.foo
def process_append_queue(append_queue, bar):
while True:
x = append_queue.get()
if x is None:
break
bar.append(x)
print("worker done")
def main():
import multiprocessing as mp
bar = Test()
append_queue = mp.Queue(10)
append_queue_process = mp.Process(target=process_append_queue, args=(append_queue, bar))
append_queue_process.start()
for i in range(100):
append_queue.put(i)
append_queue.put(None)
append_queue_process.join()
print str(bar.get())
if __name__=="__main__":
main()
When you call bar.get() at the end of the main() function why does it still return an empty list? How can I make it so that the child process also works with the same instance of Test not a new one?
All answers appreciated!
In general, processes have distinct address spaces, so that mutations of an object in one process have no effect on any object in any other process. Interprocess communication is needed to tell a process about changes made in another process.
That can be done explicitly (using things like multiprocessing.Queue), or implicitly if you use a facility implemented by multiprocessing for this purpose. For example, a great deal of work is done under the covers to make changes to a multiprocessing.Queue visible across processes.
The easiest way in your specific example is to replace your __init__ function like so:
def __init__(self):
import multiprocessing as mp
self.foo = mp.Manager().list()
It so happens that an mp.Manager instance supports a list() method that creates a process-aware list object (really a proxy for a list object, which forwards list operations to an under-the-covers server process that maintains a single copy of "the real" list - the list object isn't really shared across processes, because that's impossible - but the proxies make it appear to be shared).
So if you make that change, your code will display the results you expect - and there is no simpler way.
Note that multiprocessing works better the less IPC (interprocess communication) you need, and that's true pretty much regardless of application or programming language.
Objects are copied between processes by pickling them and passing the string over a pipe. There is no way to achieve true "shared memory" for pure Python objects between processes. To achieve precisely this type of synchronization, take a look at the multiprocessing.Manager documentation (https://docs.python.org/2/library/multiprocessing.html#managers) which provides you with examples about synchronized versions of common Python container types. These are "proxied" containers where operations on the proxy send all arguments across the process boundary, pickled, and are then executed in the parent process.
So I create a class variable called executor in my class
executor = ThreadPoolExecutor(100)
and instead of having functions and methods and using decorators, I simply use following line to handle my blocking tasks(like io and hash creation and....) in my async methods
result = await to_tornado_future(self.executor.submit(blocking_method, param1, param2)
I decided to use this style cause
1- decorators are slower by nature
2- there is no need for extra methods and functions
3- it workes as expected and creates no threads before it needed
Am I right ? Please use reasons(I want to know if the way I use, is slower or uses more resources or....)
Update
Based on Ben answer, my above approach was not correct
so I ended up using following function as needed, I think it's the best way to go
def pool(pool_executor, fn, *args, **kwargs):
new_future = Future()
result_future = pool_executor.submit(fn, *args, **kwargs)
result_future.add_done_callback(lambda f: new_future.set_result(f.result()))
return new_future
usage:
result = await pool(self.executor, time.sleep, 3)
This is safe as long as all your blocking methods are thread-safe. Since you mentioned doing IO in these threads, I'll point out that doing file IO here is fine but all network IO in Tornado must occur on the IOLoop's thread.
Why do you say "decorators are slower by nature"? Which decorators are slower than what? Some decorators have no performance overhead at all (although most do have some runtime cost). to_tornado_future(executor.submit()) isn't free either. (BTW, I think you want tornado.gen.convert_yielded instead of tornado.platform.asyncio.to_tornado_future. executor.submit doesn't return an asyncio.Future).
As a general rule, running blocking_method on a thread pool is going to be slower than just calling it directly. You should do this only when blocking_method is likely to block for long enough that you want the main thread free to do other things in the meantime.
I have an embarrassingly parallelizable problem consisting on a bunch of tasks that get solved independently of each other. Solving each of the tasks is quite lengthy, so this is a prime candidate for multi-processing.
The problem is that solving my tasks requires creating a specific object that is very time consuming on its own but can be reused for all the tasks (think of an external binary program that needs to be launched), so in the serial version I do something like this:
def costly_function(task, my_object):
solution = solve_task_using_my_object
return solution
def solve_problem():
my_object = create_costly_object()
tasks = get_list_of_tasks()
all_solutions = [costly_function(task, my_object) for task in tasks]
return all_solutions
When I try to parallelize this program using multiprocessing, my_object cannot be passed as a parameter for a number of reasons (it cannot be pickled, and it should not run more than one task at the same time), so I have to resort to create a separate instance of the object for each task:
def costly_function(task):
my_object = create_costly_object()
solution = solve_task_using_my_object
return solution
def psolve_problem():
pool = multiprocessing.Pool()
tasks = get_list_of_tasks()
all_solutions = pool.map_async(costly_function, tasks)
return all_solutions.get()
but the added costs of creating multiple instances of my_object makes this code only marginally faster than the serialized one.
If I could create a separate instance of my_object in each process and then reuse them for all the tasks that get run in that process, my timings would significantly improve. Any pointers on how to do that?
I found a simple way of solving my own problem without bringing in any tools besides the standard library, I thought I'd write it down here in case somebody else has a similar problem.
multiprocessing.Pool accepts an initializer function (with arguments) that gets run when each process is launched. The return value of this function is not stored anywhere, but one can take advantage of the function to set up a global variable:
def init_process():
global my_object
my_object = create_costly_object()
def costly_function(task):
global my_object
solution = solve_task_using_my_object
return solution
def psolve_problem():
pool = multiprocessing.Pool(initializer=init_process)
tasks = get_list_of_tasks()
all_solutions = pool.map_async(costly_function, tasks)
return all_solutions.get()
Since each process has a separate global namespace, the instantiated objects do not clash, and they are created only once per process.
Probably not the most elegant solution, but it's simple enough and gives me a near-linear speedup.
you can have celery project handle all this for you, among many other features it also have a way to run some task initialization that can be used latter by all tasks
You are right that you are constrained to pickable objects when using multiprocessing. Are you absolutely sure that your object is un-pickleable?
Have you tried dill? If you import it in, anytime pickle is called it will use the dill bindings. It worked for me, when I was trying to use multiprocessing on sympy equations.