Thread blocks in an RLock - python

I have this implementation:
def mlock(f):
'''Method lock. Uses a class lock to execute the method'''
def wrapper(self, *args, **kwargs):
with self._lock:
res = f(self, *args, **kwargs)
return res
return wrapper
class Lockable(object):
def __init__(self):
self._lock = threading.RLock()
Which I use in several places, for example:
class Fifo(Lockable):
'''Implementation of a Fifo. It will grow until the given maxsize; then it will drop the head to add new elements'''
def __init__(self, maxsize, name='FIFO', data=None, inserted=0, dropped=0):
self.maxsize = maxsize
self.name = name
self.inserted = inserted
self.dropped = dropped
self._fifo = []
self._cnt = None
Lockable.__init__(self)
if data:
for d in data:
self.put(d)
#mlock
def __len__(self):
length = len(self._fifo)
return length
...
The application is quite complex, but it works well. Just to make sure, I have been doing stress tests of the running service, and I find that it sometimes (rarely) deadlocks in the mlock. I assume another thread is holding the lock and not releasing it. How can I debug this? Please note that:
it is very difficult to reproduce: I need hours of testing to deadlock
the application is running in the background
once it deadlocks, I can not interact with it anymore
I would like to know:
what thread is holding the lock?
why is it not being released? I am using a context manager to acquire the lock, so it should always be released. Where is the bug?!
What options do I have to further debug this?
I have been checking if there is any way of knowing what thread is holding an RLock, but it seems there is not API for this.

I don't think there's an easy solution for this, but it can be done with some work.
Personally, I've found the following useful (albeit in C++).
Start by creating a Lockable base that uses tracks threads' interactions with it. A Lockable object will use an additional (non-recursive) lock for protecting a dictionary mapping thread ids to interactions with it:
When a thread tries to lock, it (locks and) creates an entry.
When it acquires the lock, it (locks and) modifies the entry.
When it releases the lock, it (locks and) removes the entry.
Additionally, a Lockable object will have a low-priority thread, waking up very rarely (once every several minutes), and seeing if there's indication of a deadlock (approximated by the event that a thread has been holding the lock for a long time, while at least one other thread has waited for it).
The entry for a thread should therefore include:
the operation's time
the stacktrace info leading to the operation.
The problem is that this can alter the relative timing of threads, which might cause your program to go into different execution paths than it normally does.
Here you need to get creative. You might need to also induce (random) time lapses in these (and possibly other) operations.

Related

Shared pool map between processes with object-oriented python

(python2.7)
I'm trying to do a kind of scanner, that has to walk through CFG nodes, and split in different processes on branching for parallelism purpose.
The scanner is represented by an object of class Scanner. This class has one method traverse that walks through the said graph and splits if necessary.
Here how it looks:
class Scanner(object):
def __init__(self, atrb1, ...):
self.attribute1 = atrb1
self.process_pool = Pool(processes=4)
def traverse(self, ...):
[...]
if branch:
self.process_pool.map(my_func, todo_list).
My problem is the following:
How do I create a instance of multiprocessing.Pool, that is shared between all of my processes ? I want it to be shared, because since a path can be splitted again, I do not want to end with a kind of fork bomb, and having the same Pool will help me to limit the number of processes running at the same time.
The above code does not work, since Pool can not be pickled. In consequence, I have tried that:
class Scanner(object):
def __getstate__(self):
self_dict = self.__dict__.copy()
def self_dict['process_pool']
return self_dict
[...]
But obviously, it results in having self.process_pool not defined in the created processes.
Then, I tried to create a Pool as a module attribute:
process_pool = Pool(processes=4)
def my_func(x):
[...]
class Scanner(object):
def __init__(self, atrb1, ...):
self.attribute1 = atrb1
def traverse(self, ...):
[...]
if branch:
process_pool.map(my_func, todo_list)
It does not work, and this answer explains why.
But here comes the thing, wherever I create my Pool, something is missing. If I create this Pool at the end of my file, it does not see self.attribute1, the same way it did not see answer and fails with an AttributeError.
I'm not even trying to share it yet, and I'm already stuck with Multiprocessing way of doing thing.
I don't know if I have not been thinking correctly the whole thing, but I can not believe it's so complicated to handle something as simple as "having a worker pool and giving them tasks".
Thank you,
EDIT:
I resolved my first problem (AttributeError), my class had a callback as its attribute, and this callback was defined in the main script file, after the import of the scanner module... But the concurrency and "do not fork bomb" thing is still a problem.
What you want to do can't be done safely. Think about if you somehow had a single shared Pool shared across parent and worker processes, with, say, two worker processes. The parent runs a map that tries to perform two tasks, and each task needs to map two more tasks. The two parent dispatched tasks go to each worker, and the parent blocks. Each worker sends two more tasks to the shared pool and blocks for them to complete. But now all workers are now occupied, waiting for a worker to become free; you've deadlocked.
A safer approach would be to have the workers return enough information to dispatch additional tasks in the parent. Then you could do something like:
class MoreWork(object):
def __init__(self, func, *args):
self.func = func
self.args = args
pool = multiprocessing.Pool()
try:
base_task = somefunc, someargs
outstanding = collections.deque([pool.apply_async(*base_task)])
while outstanding:
result = outstanding.popleft().get()
if isinstance(result, MoreWork):
outstanding.append(pool.apply_async(result.func, result.args))
else:
... do something with a "final" result, maybe breaking the loop ...
finally:
pool.terminate()
What the functions are is up to you, they'd just return information in a MoreWork when there was more to do, not launch a task directly. The point is to ensure that by having the parent be solely responsible for task dispatch, and the workers solely responsible for task completion, you can't deadlock due to all workers being blocked waiting for tasks that are in the queue, but not being processed.
This is also not at all optimized; ideally, you wouldn't block waiting on the first item in the queue if other items in the queue were complete; it's a lot easier to do this with the concurrent.futures module, specifically with concurrent.futures.wait to wait on the first available result from an arbitrary number of outstanding tasks, but you'd need a third party PyPI package to get concurrent.futures on Python 2.7.

Locking with Tornado and multiple instances

I'm fairly new to Python and Tornado, so please forgive if I overcomplicated a long-solved problem, but I didn't find much else out there.
I'm running multiple Tornado instances (multiple instances per server, multiple servers) for an application and have some tasks that only one instance should perform, such as scheduling certain events in the application. Instead of running a dedicated instance that performs this task, I'd like to have an opportunistic approach where the first instance that tries gets to do the job.
Part of my solution is a database based locking mechanism (MongoDB findAndUpdate). The code below seems to work just fine but I'd like to get some advice if this is a good solution or if there are ready-made locking and task distribution solutions out there for Tornado.
This is the decorator that acquires the lock when entering the function and releases it afterwards:
def locking(fn):
#tornado.gen.engine
def wrapped(wself, *args, **kwargs):
#tornado.gen.engine
def wrapped_callback(*cargs, **ckwargs):
logging.info("release lock")
yield tornado.gen.Task(lock.release_lock)
logging.info("release lock done")
original_callback(*cargs, **ckwargs)
logging.info("acquire lock")
yield tornado.gen.Task(model.SchedulerLock.initialize_lock, area_id=wself.area_id)
lock = yield tornado.gen.Task(model.SchedulerLock.acquire_lock, area_id=wself.area_id)
if lock:
logging.info("acquire lock done")
original_callback = kwargs['callback']
kwargs['callback'] = wrapped_callback
fn(wself, *args, **kwargs)
else:
logging.info("acquire lock not possible, postponed")
ioloop = tornado.ioloop.IOLoop.instance()
ioloop.add_timeout(datetime.timedelta(seconds=2), functools.partial(wrapped, wself, *args, **kwargs))
return wrapped
The acquire_lock method returns the lock object or False
Any thoughts on this? I know that the lock is only half of the solution, as I also need a mechanism that ensures that a one-off task only gets done once. However, this can be achieved very similarly. Is there anything that solves the problem more elegantly?

Is this an insane implementation of producer consumer type thing?

# file1.py
class _Producer(self):
def __init__(self):
self.chunksize = 6220800
with open('/dev/zero') as f:
self.thing = f.read(self.chunksize)
self.n = 0
self.start()
def start(self):
import subprocess
import threading
def produce():
self._proc = subprocess.Popen(['producer_proc'], stdout=subprocess.PIPE)
while True:
self.thing = self._proc.stdout.read(self.chunksize)
if len(self.thing) != self.chunksize:
msg = 'Expected {0} bytes. Read {1} bytes'.format(self.chunksize, len(self.thing))
raise Exception(msg)
self.n += 1
t = threading.Thread(target=produce)
t.daemon = True
t.start()
self._thread = t
def stop(self):
if self._thread.is_alive():
self._proc.terminate()
self._thread.join(1)
producer = _Producer()
producer.start()
I have written some code more or less like the above design, and now I want to be able to consume the output of producer_proc in other files by going:
# some_other_file.py
import file1
my_thing = file1.producer.thing
Multiple other consumers might be grabbing a reference to file.producer.thing, they all need to use from the same producer_proc. And the producer_proc should never be blocked. Is this a sane implementation? Does the python GIL make it thread safe, or do I need to reimplement using a Queue for getting data of the worker thread? Do consumers need to explicitly make a copy of the thing?
I guess am trying to implement something like Producer/Consumer pattern or Observer pattern, but I'm not really clear on all the technical details of design patterns.
A single producer is constantly making things
Multiple consumers using things at arbitrary times
producer.thing should be replaced by a fresh thing as soon as the new one is available, most things will go unused but that's ok
It's OK for multiple consumers to read the same thing, or to read the same thing twice in succession. They only want to be sure they have got the most recent thing when asked for it, not some stale old thing.
A consumer should be able to keep using a thing as long as they have it in scope, even though the producer may have already overwritten his self.thing with a fresh new thing.
Given your (unusual!) requirements, your implementation seems correct. In particular,
If you're only updating one attribute, the Python GIL should be sufficient. Single bytecode instructions are atomic.
If you do anything more complex, add locking! It's basically harmless anyway - if you cared about performance or multicore scalability, you probably wouldn't be using Python!
In particular, be aware that self.thing and self.n in this code are updated in a separate bytecode instructions. The GIL could be released/acquired between, so you can't get a consistent view of the two of them unless you add locking. If you're not going to do that, I'd suggest removing self.n as it's an "attractive nuisance" (easily misused) or at least adding a comment/docstring with this caveat.
Consumers don't need to make a copy. You're not ever mutating a particular object pointed to by self.thing (and couldn't with string objects; they're immutable) and Python is garbage-collected, so as long as a consumer grabbed a reference to it, it can keep accessing it without worrying too much about what other threads are doing. The worst that could happen is your program using a lot of memory from several generations of self.thing being kept alive.
I'm a bit curious where your requirements came from. In particular, that you don't care if a thing is never used or used many times.

Dynamically allocating and destroying mutexes?

I have an application that's built on top of Eventlet.
I'm trying to write a decent decorator for synchronizing calls to certain methods across threads.
The decorator currently looks something like this:
_semaphores_semaphore = semaphore.Semaphore()
_semaphores = {}
def synchronized(name):
def wrap(f):
def inner(*args, **kwargs):
# Grab the lock protecting _semaphores.
with _semaphores_semaphore:
# If the named semaphore does not yet exist, create it.
if name not in _semaphores:
_semaphores[name] = semaphore.Semaphore()
sem = _semaphores[name]
with sem:
return f(*args, **kwargs)
This works fine, and looks nice and thread safe to me, although this whole thread safety and locking business might be a bit rusty for me.
The problem is that a specific, existing use of semaphores elsewhere in the application, which I'm wanting to convert to using this decorator, creates these semaphores on the fly: Based on user input, it has to create a file. It checks in a dict whether it already has a semaphore for this file, if not, it creates one, and locks it. Once it's done and has released the lock, it checks if it's been locked again (by another process in the mean time), and if not, it deletes the semaphore. This code is written with the assumption of green threads and is safe in that context, but if I want to convert it to use my decorator, and this is what I can't work out.
If I don't care about cleaning up the possibly-never-to-be-used-again semaphores (there could be hundreds of thousands of these), I'm fine. If I do want to clean them up, I'm not sure what to do.
To delete the semaphore, it seems obvious that I need to be holding the _semaphores_semaphore, since I'm manipulating the _semaphores dict, but I have to do something with the specific semaphore, too, but everything I can think of seems to be racy:
* While inside the "with sem:" block, I could grab the _semaphores_semaphore and sem from _semaphores. However, other threads might be blocked waiting for it (at "with sem:"), and if a new thread comes along wanting to touch the same resource, it will not find the same semaphore in _semaphores, but instead create a new one => fail.
I could improve this slightly by checking the balance of sem to see if another thread is already waiting for me to release it. If so, leave it alone, if not, delete it. This way, the last thread waiting to act on the resource will delete it. However, if a thread has just left the "with _semaphores_semaphore:" block, but hasn't yet made it to "with sem:", I have the same problem as before => fail.
It feels like I'm missing something obvious, but I can't work out what it is.
I think you might be able to solve it with a reader-writer lock aka. shared-exclusive lock on the _semaphores dict.
This is untested code, to show the principle. An RWLock implementation can be found in e.g. http://code.activestate.com/recipes/413393-multiple-reader-one-writer-mrow-resource-locking/
_semaphores_rwlock = RWLock()
_semaphores = {}
def synchronized(name):
def wrap(f):
def inner(*args, **kwargs):
lock = _semaphores_rwlock.reader()
# If the named semaphore does not yet exist, create it.
if name not in _semaphores:
lock = _semaphores_rwlock.writer()
_semaphores[name] = semaphore.Semaphore()
sem = _semaphores[name]
with sem:
retval = f(*args, **kwargs)
lock.release()
return retval
When you want to clean up you do:
wlock = _semaphores_rwlock.writer() #this might take a while; it waits for all readers to release
cleanup(_semaphores)
wlock.release()
mchro's answer didn't work for me since it blocks all threads on a single semaphore whenever one thread needs to create a new semaphore.
The answer that I came up with is to keep counters of occupants between the two transactions with _semaphores (which are both done behind the same mutex):
A: get semaphore
A1: dangerzone
B: with sem: block etc
C: cleanup semaphore
The problem is knowing how many people are between A and C. The counter of the semaphore doesn't tell you that, since someone may be in A1. The answer is to keep a counter of entrants along with each semaphore in _semaphores, increment it at A, decrement it at C, and if it's at 0 then you know that there's no-one else in A-C with the same key and you can safely delete it.

Python Locking Implementation (with threading module)

This is probably a rudimentary question, but I'm new to threaded programming in Python and am not entirely sure what the correct practice is.
Should I be creating a single lock object (either globally or being passed around) and using that everywhere that I need to do locking? Or, should I be creating multiple lock instances in each of the classes where I will be employing them. Take these 2 rudimentary code samples, which direction is best to go? The main difference being that a single lock instance is used in both class A and B in the second, while multiple instances are used in the first.
Sample 1
class A():
def __init__(self, theList):
self.theList = theList
self.lock = threading.Lock()
def poll(self):
while True:
# do some stuff that eventually needs to work with theList
self.lock.acquire()
try:
self.theList.append(something)
finally:
self.lock.release()
class B(threading.Thread):
def __init__(self,theList):
self.theList = theList
self.lock = threading.Lock()
self.start()
def run(self):
while True:
# do some stuff that eventually needs to work with theList
self.lock.acquire()
try:
self.theList.remove(something)
finally:
self.lock.release()
if __name__ == "__main__":
aList = []
for x in range(10):
B(aList)
A(aList).poll()
Sample 2
class A():
def __init__(self, theList,lock):
self.theList = theList
self.lock = lock
def poll(self):
while True:
# do some stuff that eventually needs to work with theList
self.lock.acquire()
try:
self.theList.append(something)
finally:
self.lock.release()
class B(threading.Thread):
def __init__(self,theList,lock):
self.theList = theList
self.lock = lock
self.start()
def run(self):
while True:
# do some stuff that eventually needs to work with theList
self.lock.acquire()
try:
self.theList.remove(something)
finally:
self.lock.release()
if __name__ == "__main__":
lock = threading.Lock()
aList = []
for x in range(10):
B(aList,lock)
A(aList,lock).poll()
If you use a separate lock object in each class then you run a risk of deadlocking, e.g. if one operation claims the lock for A and then claims the lock for B while a different operation claims B and then A.
If you use a single lock then you're forcing code to single thread when different operations could be run in parallel. That isn't always as serious in Python (which has a global lock in any case) as in other languages, but say you were to hold a global lock while writing to a file Python would release the GIL but you'd have blocked everything else.
So it's a tradeoff. I'd say go for little locks as that way you maximise the chance for parallel execution, but take care never to claim more than one lock at a time, and try not to hold onto a lock for any longer than you absolutely have to.
So far as your specific examples go, the first one is just plain broken. If you lock operations on theList then you must use the same lock every time or you aren't locking anything. That may not matter here as list.append and list.remove are effectively atomic anyway, but if you do need to lock access to the list you need to be sure to use the same lock every time. The best way to do that is to hold the list and a lock as attributes of a class and force all access to the list to go through methods of the containing class. Then pass the container class around not the list or the lock.
In the general case, a single global lock is less efficient (more contention) but safer (no risk of deadlock) as long as it's a RLock (reentrant) rather than a plain Lock.
The potential problems come when a thread that's executing while holding a lock tries to acquire another (or the same) lock, for example by calling another method that contains the acquire call. If a thread that's already holding a lock tries to acquire it again, it will block forever if the lock's a plain Lock, but proceed smoothly if it's a slightly more complex RLock -- that's why the latter is called reentrant, because the thread holding it can "enter" (acquire the lock) again. Essentially, a RLock keeps track of which thread holds it, and how many time the thread has acquired the lock, while the simpler Lock does not keep such information around.
With multiple locks, the deadlock problem comes when one thread tries to acquire lock A then lock B, while another tries to acquire first lock B, then lock A. If that occurs, then sooner or later you'll be in a situation where the first lock holds A, the second one holds B, and each tries to acquire the lock that the other one is holding -- so both block forever.
One way to prevent multiple-lock deadlocks is to make sure that locks are always acquired in the same order, whatever thread is doing the acquiring. However, when each instance has its own lock, that's exceedingly difficult to organize with any clarity and simplicity.

Categories

Resources