Locking with Tornado and multiple instances

Locking with Tornado and multiple instances - python

I'm fairly new to Python and Tornado, so please forgive if I overcomplicated a long-solved problem, but I didn't find much else out there.
I'm running multiple Tornado instances (multiple instances per server, multiple servers) for an application and have some tasks that only one instance should perform, such as scheduling certain events in the application. Instead of running a dedicated instance that performs this task, I'd like to have an opportunistic approach where the first instance that tries gets to do the job.
Part of my solution is a database based locking mechanism (MongoDB findAndUpdate). The code below seems to work just fine but I'd like to get some advice if this is a good solution or if there are ready-made locking and task distribution solutions out there for Tornado.
This is the decorator that acquires the lock when entering the function and releases it afterwards:
def locking(fn):
#tornado.gen.engine
def wrapped(wself, *args, **kwargs):
#tornado.gen.engine
def wrapped_callback(*cargs, **ckwargs):
logging.info("release lock")
yield tornado.gen.Task(lock.release_lock)
logging.info("release lock done")
original_callback(*cargs, **ckwargs)
logging.info("acquire lock")
yield tornado.gen.Task(model.SchedulerLock.initialize_lock, area_id=wself.area_id)
lock = yield tornado.gen.Task(model.SchedulerLock.acquire_lock, area_id=wself.area_id)
if lock:
logging.info("acquire lock done")
original_callback = kwargs['callback']
kwargs['callback'] = wrapped_callback
fn(wself, *args, **kwargs)
else:
logging.info("acquire lock not possible, postponed")
ioloop = tornado.ioloop.IOLoop.instance()
ioloop.add_timeout(datetime.timedelta(seconds=2), functools.partial(wrapped, wself, *args, **kwargs))
return wrapped
The acquire_lock method returns the lock object or False
Any thoughts on this? I know that the lock is only half of the solution, as I also need a mechanism that ensures that a one-off task only gets done once. However, this can be achieved very similarly. Is there anything that solves the problem more elegantly?

Related

In an event-loop based server (FastAPI), are async/await functions atomic?

I've read so much about async/await, co-routines, and single-threaded, non-blocking servers. But I'm still not %100 about this situation, and I need to be.
I'm running FastAPI.
I have an endpoint:
#router.get("/get-next-id")
async def get_next_id():
return await id_module.get_next_id()
It calls (and awaits) ths async function in 'id_module' (I know there are safer ways to generate a sequential ID. This is just an example to demonstrate a possible concurrency issue.):
async def get_next_id():
next_id = None
with open("id_flat_file.json") as serialized_data_file:
id_dict = json.load(serialized_data_file)
next_id = id_dict["id"] + 1
with open("id_flat_file.json", 'w') as serialized_data_file:
json.dump({"id": next_id}, serialized_data_file)
return next_id
Questions:
Is get_next_id() guaranteed to run on the main thread of the event loop?
If so, is it thread safe? I.e., is it atomic to the main thread? Could it ever be 'paused' and interrupted by the main thread (which could also be running the same code for another request)? I.e., in my code, do I have to worry about a race condition and somehow otherwise protect (lock) against that?

Shared pool map between processes with object-oriented python

(python2.7)
I'm trying to do a kind of scanner, that has to walk through CFG nodes, and split in different processes on branching for parallelism purpose.
The scanner is represented by an object of class Scanner. This class has one method traverse that walks through the said graph and splits if necessary.
Here how it looks:
class Scanner(object):
def __init__(self, atrb1, ...):
self.attribute1 = atrb1
self.process_pool = Pool(processes=4)
def traverse(self, ...):
[...]
if branch:
self.process_pool.map(my_func, todo_list).
My problem is the following:
How do I create a instance of multiprocessing.Pool, that is shared between all of my processes ? I want it to be shared, because since a path can be splitted again, I do not want to end with a kind of fork bomb, and having the same Pool will help me to limit the number of processes running at the same time.
The above code does not work, since Pool can not be pickled. In consequence, I have tried that:
class Scanner(object):
def __getstate__(self):
self_dict = self.__dict__.copy()
def self_dict['process_pool']
return self_dict
[...]
But obviously, it results in having self.process_pool not defined in the created processes.
Then, I tried to create a Pool as a module attribute:
process_pool = Pool(processes=4)
def my_func(x):
[...]
class Scanner(object):
def __init__(self, atrb1, ...):
self.attribute1 = atrb1
def traverse(self, ...):
[...]
if branch:
process_pool.map(my_func, todo_list)
It does not work, and this answer explains why.
But here comes the thing, wherever I create my Pool, something is missing. If I create this Pool at the end of my file, it does not see self.attribute1, the same way it did not see answer and fails with an AttributeError.
I'm not even trying to share it yet, and I'm already stuck with Multiprocessing way of doing thing.
I don't know if I have not been thinking correctly the whole thing, but I can not believe it's so complicated to handle something as simple as "having a worker pool and giving them tasks".
Thank you,
EDIT:
I resolved my first problem (AttributeError), my class had a callback as its attribute, and this callback was defined in the main script file, after the import of the scanner module... But the concurrency and "do not fork bomb" thing is still a problem.

What you want to do can't be done safely. Think about if you somehow had a single shared Pool shared across parent and worker processes, with, say, two worker processes. The parent runs a map that tries to perform two tasks, and each task needs to map two more tasks. The two parent dispatched tasks go to each worker, and the parent blocks. Each worker sends two more tasks to the shared pool and blocks for them to complete. But now all workers are now occupied, waiting for a worker to become free; you've deadlocked.
A safer approach would be to have the workers return enough information to dispatch additional tasks in the parent. Then you could do something like:
class MoreWork(object):
def __init__(self, func, *args):
self.func = func
self.args = args
pool = multiprocessing.Pool()
try:
base_task = somefunc, someargs
outstanding = collections.deque([pool.apply_async(*base_task)])
while outstanding:
result = outstanding.popleft().get()
if isinstance(result, MoreWork):
outstanding.append(pool.apply_async(result.func, result.args))
else:
... do something with a "final" result, maybe breaking the loop ...
finally:
pool.terminate()
What the functions are is up to you, they'd just return information in a MoreWork when there was more to do, not launch a task directly. The point is to ensure that by having the parent be solely responsible for task dispatch, and the workers solely responsible for task completion, you can't deadlock due to all workers being blocked waiting for tasks that are in the queue, but not being processed.
This is also not at all optimized; ideally, you wouldn't block waiting on the first item in the queue if other items in the queue were complete; it's a lot easier to do this with the concurrent.futures module, specifically with concurrent.futures.wait to wait on the first available result from an arbitrary number of outstanding tasks, but you'd need a third party PyPI package to get concurrent.futures on Python 2.7.

Thread blocks in an RLock

I have this implementation:
def mlock(f):
'''Method lock. Uses a class lock to execute the method'''
def wrapper(self, *args, **kwargs):
with self._lock:
res = f(self, *args, **kwargs)
return res
return wrapper
class Lockable(object):
def __init__(self):
self._lock = threading.RLock()
Which I use in several places, for example:
class Fifo(Lockable):
'''Implementation of a Fifo. It will grow until the given maxsize; then it will drop the head to add new elements'''
def __init__(self, maxsize, name='FIFO', data=None, inserted=0, dropped=0):
self.maxsize = maxsize
self.name = name
self.inserted = inserted
self.dropped = dropped
self._fifo = []
self._cnt = None
Lockable.__init__(self)
if data:
for d in data:
self.put(d)
#mlock
def __len__(self):
length = len(self._fifo)
return length
...
The application is quite complex, but it works well. Just to make sure, I have been doing stress tests of the running service, and I find that it sometimes (rarely) deadlocks in the mlock. I assume another thread is holding the lock and not releasing it. How can I debug this? Please note that:
it is very difficult to reproduce: I need hours of testing to deadlock
the application is running in the background
once it deadlocks, I can not interact with it anymore
I would like to know:
what thread is holding the lock?
why is it not being released? I am using a context manager to acquire the lock, so it should always be released. Where is the bug?!
What options do I have to further debug this?
I have been checking if there is any way of knowing what thread is holding an RLock, but it seems there is not API for this.

I don't think there's an easy solution for this, but it can be done with some work.
Personally, I've found the following useful (albeit in C++).
Start by creating a Lockable base that uses tracks threads' interactions with it. A Lockable object will use an additional (non-recursive) lock for protecting a dictionary mapping thread ids to interactions with it:
When a thread tries to lock, it (locks and) creates an entry.
When it acquires the lock, it (locks and) modifies the entry.
When it releases the lock, it (locks and) removes the entry.
Additionally, a Lockable object will have a low-priority thread, waking up very rarely (once every several minutes), and seeing if there's indication of a deadlock (approximated by the event that a thread has been holding the lock for a long time, while at least one other thread has waited for it).
The entry for a thread should therefore include:
the operation's time
the stacktrace info leading to the operation.
The problem is that this can alter the relative timing of threads, which might cause your program to go into different execution paths than it normally does.
Here you need to get creative. You might need to also induce (random) time lapses in these (and possibly other) operations.

Dynamically allocating and destroying mutexes?

I have an application that's built on top of Eventlet.
I'm trying to write a decent decorator for synchronizing calls to certain methods across threads.
The decorator currently looks something like this:
_semaphores_semaphore = semaphore.Semaphore()
_semaphores = {}
def synchronized(name):
def wrap(f):
def inner(*args, **kwargs):
# Grab the lock protecting _semaphores.
with _semaphores_semaphore:
# If the named semaphore does not yet exist, create it.
if name not in _semaphores:
_semaphores[name] = semaphore.Semaphore()
sem = _semaphores[name]
with sem:
return f(*args, **kwargs)
This works fine, and looks nice and thread safe to me, although this whole thread safety and locking business might be a bit rusty for me.
The problem is that a specific, existing use of semaphores elsewhere in the application, which I'm wanting to convert to using this decorator, creates these semaphores on the fly: Based on user input, it has to create a file. It checks in a dict whether it already has a semaphore for this file, if not, it creates one, and locks it. Once it's done and has released the lock, it checks if it's been locked again (by another process in the mean time), and if not, it deletes the semaphore. This code is written with the assumption of green threads and is safe in that context, but if I want to convert it to use my decorator, and this is what I can't work out.
If I don't care about cleaning up the possibly-never-to-be-used-again semaphores (there could be hundreds of thousands of these), I'm fine. If I do want to clean them up, I'm not sure what to do.
To delete the semaphore, it seems obvious that I need to be holding the _semaphores_semaphore, since I'm manipulating the _semaphores dict, but I have to do something with the specific semaphore, too, but everything I can think of seems to be racy:
* While inside the "with sem:" block, I could grab the _semaphores_semaphore and sem from _semaphores. However, other threads might be blocked waiting for it (at "with sem:"), and if a new thread comes along wanting to touch the same resource, it will not find the same semaphore in _semaphores, but instead create a new one => fail.
I could improve this slightly by checking the balance of sem to see if another thread is already waiting for me to release it. If so, leave it alone, if not, delete it. This way, the last thread waiting to act on the resource will delete it. However, if a thread has just left the "with _semaphores_semaphore:" block, but hasn't yet made it to "with sem:", I have the same problem as before => fail.
It feels like I'm missing something obvious, but I can't work out what it is.

I think you might be able to solve it with a reader-writer lock aka. shared-exclusive lock on the _semaphores dict.
This is untested code, to show the principle. An RWLock implementation can be found in e.g. http://code.activestate.com/recipes/413393-multiple-reader-one-writer-mrow-resource-locking/
_semaphores_rwlock = RWLock()
_semaphores = {}
def synchronized(name):
def wrap(f):
def inner(*args, **kwargs):
lock = _semaphores_rwlock.reader()
# If the named semaphore does not yet exist, create it.
if name not in _semaphores:
lock = _semaphores_rwlock.writer()
_semaphores[name] = semaphore.Semaphore()
sem = _semaphores[name]
with sem:
retval = f(*args, **kwargs)
lock.release()
return retval
When you want to clean up you do:
wlock = _semaphores_rwlock.writer() #this might take a while; it waits for all readers to release
cleanup(_semaphores)
wlock.release()

mchro's answer didn't work for me since it blocks all threads on a single semaphore whenever one thread needs to create a new semaphore.
The answer that I came up with is to keep counters of occupants between the two transactions with _semaphores (which are both done behind the same mutex):
A: get semaphore
A1: dangerzone
B: with sem: block etc
C: cleanup semaphore
The problem is knowing how many people are between A and C. The counter of the semaphore doesn't tell you that, since someone may be in A1. The answer is to keep a counter of entrants along with each semaphore in _semaphores, increment it at A, decrement it at C, and if it's at 0 then you know that there's no-one else in A-C with the same key and you can safely delete it.

Waiting on event with Twisted and PB

I have a python app that uses multiple threads and I am curious about the best way to wait for something in python without burning cpu or locking the GIL.
my app uses twisted and I spawn a thread to run a long operation so I do not stomp on the reactor thread. This long operation also spawns some threads using twisted's deferToThread to do something else, and the original thread wants to wait for the results from the defereds.
What I have been doing is this
while self._waiting:
time.sleep( 0.01 )
which seemed to disrupt twisted PB's objects from receiving messages so I thought sleep was locking the GIL. Further investigation by the posters below revealed however that it does not.
There are better ways to wait on threads without blocking the reactor thread or python posted below.

If you're already using Twisted, you should never need to "wait" like this.
As you've described it:
I spawn a thread to run a long operation ... This long operation also spawns some threads using twisted's deferToThread ...
That implies that you're calling deferToThread from your "long operation" thread, not from your main thread (the one where reactor.run() is running). As Jean-Paul Calderone already noted in a comment, you can only call Twisted APIs (such as deferToThread) from the main reactor thread.
The lock-up that you're seeing is a common symptom of not following this rule. It has nothing to do with the GIL, and everything to do with the fact that you have put Twisted's reactor into a broken state.
Based on your loose description of your program, I've tried to write a sample program that does what you're talking about based entirely on Twisted APIs, spawning all threads via Twisted and controlling them all from the main reactor thread.
import time
from twisted.internet import reactor
from twisted.internet.defer import gatherResults
from twisted.internet.threads import deferToThread, blockingCallFromThread
def workReallyHard():
"'Work' function, invoked in a thread."
time.sleep(0.2)
def longOperation():
for x in range(10):
workReallyHard()
blockingCallFromThread(reactor, startShortOperation, x)
result = blockingCallFromThread(reactor, gatherResults, shortOperations)
return 'hooray', result
def shortOperation(value):
workReallyHard()
return value * 100
shortOperations = []
def startShortOperation(value):
def done(result):
print 'Short operation complete!', result
return result
shortOperations.append(
deferToThread(shortOperation, value).addCallback(done))
d = deferToThread(longOperation)
def allDone(result):
print 'Long operation complete!', result
reactor.stop()
d.addCallback(allDone)
reactor.run()
Note that at the point in allDone where the reactor is stopped, you could fire off another "long operation" and have it start the process all over again.

Have you tried condition variables? They are used like
condition = Condition()
def consumer_in_thread_A():
condition.acquire()
try:
while resource_not_yet_available:
condition.wait()
# Here, the resource is available and may be
# consumed
finally:
condition.release()
def produce_in_thread_B():
# ... create resource, whatsoever
condition.acquire()
try:
condition.notify_all()
finally:
condition.release()
Condition variables act as locks (acquire and release), but their main purpose is to provide the control mechanism which allows to wait for them to be notify-d or notify_all-d.

I recently found out that calling
time.sleep( X ) will lock the GIL for
the entire time X and therefore freeze
ALL python threads for that time
period.
You found wrongly -- this is definitely not how it works. What's the source where you found this mis-information?
Anyway, then you clarify (in comments -- better edit your Q!) that you're using deferToThread and your problem with this is that...:
Well yes I defer the action to a
thread and give twisted a callback.
But the parent thread needs to wait
for the whole series of sub threads to
complete before it can move onto a new
set of sub threads to spawn
So use as the callback a method of an object with a counter -- start it at 0, increment it by one every time you're deferring-to-thread and decrement it by one in the callback method.
When the callback method sees that the decremented counter has gone back to 0, it knows that we're done waiting "for the whole series of sub threads to complete" and then the time has come to "move on to a new set of sub threads to spawn", and thus, in that case only, calls the "spawn a new set of sub threads" function or method -- it's that easy!
E.g. (net of typos &c as this is untested code, just to give you the idea)...:
class Waiter(object):
def __init__(self, what_next, *a, **k):
self.counter = 0
self.what_next = what_next
self.a = a
self.k = k
def one_more(self):
self.counter += 1
def do_wait(self, *dont_care):
self.counter -= 1
if self.counter == 0:
self.what_next(*self.a, **self.k)
def spawn_one_thread(waiter, long_calculation, *a, **k):
waiter.one_more()
d = threads.deferToThread(long_calculation, *a, **k)
d.addCallback(waiter.do_wait)
def spawn_all(waiter, list_of_lists_of_functions_args_and_kwds):
if not list_of_lists_of_functions_args_and_kwds:
return
if waiter is None:
waiter=Waiter(spawn_all, list_of_lists_of_functions_args_and_kwds)
this_time = list_of_list_of_functions_args_and_kwds.pop(0)
for f, a, k in this_time:
spawn_one_thread(waiter, f, *a, **k)
def start_it_all(list_of_lists_of_functions_args_and_kwds):
spawn_all(None, list_of_lists_of_functions_args_and_kwds)

According to the Python source, time.sleep() does not hold the GIL.
http://code.python.org/hg/trunk/file/98e56689c59c/Modules/timemodule.c#l920
Note the use of Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS, as documented here:
http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock

The threading module allows you to spawn a thread, which is then represented by a Thread object. That object has a join method that you can use to wait for the subthread to complete.
See http://docs.python.org/library/threading.html#module-threading

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Locking with Tornado and multiple instances - python

Related

In an event-loop based server (FastAPI), are async/await functions atomic?

Shared pool map between processes with object-oriented python

Thread blocks in an RLock

Dynamically allocating and destroying mutexes?

Waiting on event with Twisted and PB

Categories

Resources