How can I reproduce the race conditions in this python code reliably? - python

Context
I recently posted a timer class for review on Code Review. I'd had a gut feeling there were concurrency bugs as I'd once seen 1 unit test fail, but was unable to reproduce the failure. Hence my post to code review.
I got some great feedback highlighting various race conditions in the code. (I thought) I understood the problem and the solution, but before making any fixes, I wanted to expose the bugs with a unit test. When I tried, I realised it was difficult. Various stack exchange answers suggested I'd have to control the execution of threads to expose the bug(s) and any contrived timing would not necessarily be portable to a different machine. This seemed like a lot of accidental complexity beyond the problem I was trying to solve.
Instead I tried using the best static analysis (SA) tool for python, PyLint, to see if it'd pick out any of the bugs, but it couldn't. Why could a human find the bugs through code review (essentially SA), but a SA tool could not?
Afraid of trying to get Valgrind working with python (which sounded like yak-shaving), I decided to have a bash at fixing the bugs without reproducing them first. Now I'm in a pickle.
Here's the code now.
from threading import Timer, Lock
from time import time
class NotRunningError(Exception): pass
class AlreadyRunningError(Exception): pass
class KitchenTimer(object):
'''
Loosely models a clockwork kitchen timer with the following differences:
You can start the timer with arbitrary duration (e.g. 1.2 seconds).
The timer calls back a given function when time's up.
Querying the time remaining has 0.1 second accuracy.
'''
PRECISION_NUM_DECIMAL_PLACES = 1
RUNNING = "RUNNING"
STOPPED = "STOPPED"
TIMEUP = "TIMEUP"
def __init__(self):
self._stateLock = Lock()
with self._stateLock:
self._state = self.STOPPED
self._timeRemaining = 0
def start(self, duration=1, whenTimeup=None):
'''
Starts the timer to count down from the given duration and call whenTimeup when time's up.
'''
with self._stateLock:
if self.isRunning():
raise AlreadyRunningError
else:
self._state = self.RUNNING
self.duration = duration
self._userWhenTimeup = whenTimeup
self._startTime = time()
self._timer = Timer(duration, self._whenTimeup)
self._timer.start()
def stop(self):
'''
Stops the timer, preventing whenTimeup callback.
'''
with self._stateLock:
if self.isRunning():
self._timer.cancel()
self._state = self.STOPPED
self._timeRemaining = self.duration - self._elapsedTime()
else:
raise NotRunningError()
def isRunning(self):
return self._state == self.RUNNING
def isStopped(self):
return self._state == self.STOPPED
def isTimeup(self):
return self._state == self.TIMEUP
#property
def timeRemaining(self):
if self.isRunning():
self._timeRemaining = self.duration - self._elapsedTime()
return round(self._timeRemaining, self.PRECISION_NUM_DECIMAL_PLACES)
def _whenTimeup(self):
with self._stateLock:
self._state = self.TIMEUP
self._timeRemaining = 0
if callable(self._userWhenTimeup):
self._userWhenTimeup()
def _elapsedTime(self):
return time() - self._startTime
Question
In the context of this code example, how can I expose the race conditions, fix them, and prove they're fixed?
Extra points
extra points for a testing framework suitable for other implementations and problems rather than specifically to this code.
Takeaway
My takeaway is that the technical solution to reproduce the identified race conditions is to control the synchronism of two threads to ensure they execute in the order that will expose a bug. The important point here is that they are already identified race conditions. The best way I've found to identify race conditions is to put your code up for code review and encourage more expert people analyse it.

Traditionally, forcing race conditions in multithreaded code is done with semaphores, so you can force a thread to wait until another thread has achieved some edge condition before continuing.
For example, your object has some code to check that start is not called if the object is already running. You could force this condition to make sure it behaves as expected by doing something like this:
starting a KitchenTimer
having the timer block on a semaphore while in the running state
starting the same timer in another thread
catching AlreadyRunningError
To do some of this you may need to extend the KitchenTimer class. Formal unit tests will often use mock objects which are defined to block at critical times. Mock objects are a bigger topic than I can address here, but googling "python mock object" will turn up a lot of documentation and many implementations to choose from.
Here's a way that you could force your code to throw AlreadyRunningError:
import threading
class TestKitchenTimer(KitchenTimer):
_runningLock = threading.Condition()
def start(self, duration=1, whenTimeUp=None):
KitchenTimer.start(self, duration, whenTimeUp)
with self._runningLock:
print "waiting on _runningLock"
self._runningLock.wait()
def resume(self):
with self._runningLock:
self._runningLock.notify()
timer = TestKitchenTimer()
# Start the timer in a subthread. This thread will block as soon as
# it is started.
thread_1 = threading.Thread(target = timer.start, args = (10, None))
thread_1.start()
# Attempt to start the timer in a second thread, causing it to throw
# an AlreadyRunningError.
try:
thread_2 = threading.Thread(target = timer.start, args = (10, None))
thread_2.start()
except AlreadyRunningError:
print "AlreadyRunningError"
timer.resume()
timer.stop()
Reading through the code, identify some of the boundary conditions you want to test, then think about where you would need to pause the timer to force that condition to arise, and add Conditions, Semaphores, Events, etc. to make it happen. e.g. what happens if, just as the timer runs the whenTimeUp callback, another thread tries to stop it? You can force that condition by making the timer wait as soon as it's entered _whenTimeUp:
import threading
class TestKitchenTimer(KitchenTimer):
_runningLock = threading.Condition()
def _whenTimeup(self):
with self._runningLock:
self._runningLock.wait()
KitchenTimer._whenTimeup(self)
def resume(self):
with self._runningLock:
self._runningLock.notify()
def TimeupCallback():
print "TimeupCallback was called"
timer = TestKitchenTimer()
# The timer thread will block when the timer expires, but before the callback
# is invoked.
thread_1 = threading.Thread(target = timer.start, args = (1, TimeupCallback))
thread_1.start()
sleep(2)
# The timer is now blocked. In the parent thread, we stop it.
timer.stop()
print "timer is stopped: %r" % timer.isStopped()
# Now allow the countdown thread to resume.
timer.resume()
Subclassing the class you want to test isn't an awesome way to instrument it for testing: you'll have to override basically all of the methods in order to test race conditions in each one, and at that point there's a good argument to be made that you're not really testing the original code. Instead, you may find it cleaner to put the semaphores right in the KitchenTimer object but initialized to None by default, and have your methods check if testRunningLock is not None: before acquiring or waiting on the lock. Then you can force races on the actual code that you're submitting.
Some reading on Python mock frameworks that may be helpful. In fact, I'm not sure that mocks would be helpful in testing this code: it's almost entirely self-contained and doesn't rely on many external objects. But mock tutorials sometimes touch on issues like these. I haven't used any of these, but the documentation on these like a good place to get started:
Getting Started with Mock
Using Fudge
Python Mock Testing Techniques and Tools

The most common solution to testing thread (un)safe code is to start a lot of threads and hope for the best. The problem I, and I can imagine others, have with this is that it relies on chance and it makes tests 'heavy'.
As I ran into this a while ago I wanted to go for precision instead of brute force. The result is a piece of test code to cause race-conditions by letting the threads race neck to neck.
Sample racey code
spam = []
def set_spam():
spam[:] = foo()
use(spam)
If set_spam is called from several threads, a race condition exists between modification and use of spam. Let's try to reproduce it consistently.
How to cause race-conditions
class TriggeredThread(threading.Thread):
def __init__(self, sequence=None, *args, **kwargs):
self.sequence = sequence
self.lock = threading.Condition()
self.event = threading.Event()
threading.Thread.__init__(self, *args, **kwargs)
def __enter__(self):
self.lock.acquire()
while not self.event.is_set():
self.lock.wait()
self.event.clear()
def __exit__(self, *args):
self.lock.release()
if self.sequence:
next(self.sequence).trigger()
def trigger(self):
with self.lock:
self.event.set()
self.lock.notify()
Then to demonstrate the use of this thread:
spam = [] # Use a list to share values across threads.
results = [] # Register the results.
def set_spam():
thread = threading.current_thread()
with thread: # Acquires the lock.
# Set 'spam' to thread name
spam[:] = [thread.name]
# Thread 'releases' the lock upon exiting the context.
# The next thread is triggered and this thread waits for a trigger.
with thread:
# Since each thread overwrites the content of the 'spam'
# list, this should only result in True for the last thread.
results.append(spam == [thread.name])
threads = [
TriggeredThread(name='a', target=set_spam),
TriggeredThread(name='b', target=set_spam),
TriggeredThread(name='c', target=set_spam)]
# Create a shifted sequence of threads and share it among the threads.
thread_sequence = itertools.cycle(threads[1:] + threads[:1])
for thread in threads:
thread.sequence = thread_sequence
# Start each thread
[thread.start() for thread in threads]
# Trigger first thread.
# That thread will trigger the next thread, and so on.
threads[0].trigger()
# Wait for each thread to finish.
[thread.join() for thread in threads]
# The last thread 'has won the race' overwriting the value
# for 'spam', thus [False, False, True].
# If set_spam were thread-safe, all results would be true.
assert results == [False, False, True], "race condition triggered"
assert results == [True, True, True], "code is thread-safe"
I think I explained enough about this construction so you can implement it for your own situation. I think this fits the 'extra points' section quite nicely:
extra points for a testing framework suitable for other implementations and problems rather than specifically to this code.
Solving race-conditions
Shared variables
Each threading issue is solved in it's own specific way. In the example above I caused a race-condition by sharing a value across threads. Similar problems can occur when using global variables, such as a module attribute. The key to solving such issues may be to use a thread-local storage:
# The thread local storage is a global.
# This may seem weird at first, but it isn't actually shared among threads.
data = threading.local()
data.spam = [] # This list only exists in this thread.
results = [] # Results *are* shared though.
def set_spam():
thread = threading.current_thread()
# 'get' or set the 'spam' list. This actually creates a new list.
# If the list was shared among threads this would cause a race-condition.
data.spam = getattr(data, 'spam', [])
with thread:
data.spam[:] = [thread.name]
with thread:
results.append(data.spam == [thread.name])
# Start the threads as in the example above.
assert all(results) # All results should be True.
Concurrent reads/writes
A common threading issue is the problem of multiple threads reading and/or writing to a data holder concurrently. This problem is solved by implementing a read-write lock. The actual implementation of a read-write lock may differ. You may choose a read-first lock, a write-first lock or just at random.
I'm sure there are examples out there describing such locking techniques. I may write an example later as this is quite a long answer already. ;-)
Notes
Have a look at the threading module documentation and experiment with it a bit. As each threading issue is different, different solutions apply.
While on the subject of threading, have a look at the Python GIL (Global Interpreter Lock). It is important to note that threading may not actually be the best approach in optimizing performance (but this is not your goal). I found this presentation pretty good: https://www.youtube.com/watch?v=zEaosS1U5qY

You can test it by using a lot of threads:
import sys, random, thread
def timeup():
sys.stdout.write("Timer:: Up %f" % time())
def trdfunc(kt, tid):
while True :
sleep(1)
if not kt.isRunning():
if kt.start(1, timeup):
sys.stdout.write("[%d]: started\n" % tid)
else:
if random.random() < 0.1:
kt.stop()
sys.stdout.write("[%d]: stopped\n" % tid)
sys.stdout.write("[%d] remains %f\n" % ( tid, kt.timeRemaining))
kt = KitchenTimer()
kt.start(1, timeup)
for i in range(1, 100):
thread.start_new_thread ( trdfunc, (kt, i) )
trdfunc(kt, 0)
A couple of problem problems I see:
When a thread sees the timer as not running and try to start it, the
code generally raises an exception due to context switch in between
test and start. I think raising an exception is too much. Or you can
have an atomic testAndStart function
A similar problem occurs with stop. You can implement a testAndStop
function.
Even this code from the timeRemaining function:
if self.isRunning():
self._timeRemaining = self.duration - self._elapsedTime()
Needs some sort of atomicity, perhaps you need to grab a lock before
testing isRunning.
If you plan to share this class between threads, you need to address these issues.

In general - this is not viable solution. You can reproduce this race condition by using debugger (set breakpoints in some locations in the code, than, when it hits one of the breakpoints - freeze the thread and run the code until it hits another breakpoint, then freeze this thread and unfreeze the first thread, you can interleave threads execution in any way using this technique).
The problem is - the more threads and code you have, the more ways to interleave side effects they will have. Actually - it will grow exponentially. There is no viable solution to test it in general. It is possible only in some simple cases.
The solution to this problem are well known. Write code that is aware of it's side effects, control side effects with synchronisation primitives like locks, semaphores or queues or use immutable data if its possible.
Maybe more practical way is to use runtime checks to force correct call order. For example (pseudocode):
class RacyObject:
def __init__(self):
self.__cnt = 0
...
def isReadyAndLocked(self):
acquire_object_lock
if self.__cnt % 2 != 0:
# another thread is ready to start the Job
return False
if self.__is_ready:
self.__cnt += 1
return True
# Job is in progress or doesn't ready yet
return False
release_object_lock
def doJobAndRelease(self):
acquire_object_lock
if self.__cnt % 2 != 1:
raise RaceConditionDetected("Incorrect order")
self.__cnt += 1
do_job()
release_object_lock
This code will throw exception if you doesn't check isReadyAndLock before calling doJobAndRelease. This can be tested easily using only one thread.
obj = RacyObject()
...
# correct usage
if obj.isReadyAndLocked()
obj.doJobAndRelease()

Related

Python threading design

I'm trying to write a mini-game that allows me to practice my python threading skill. The game itself involves with timed bombs and citys that have them.
Here is my code:
class City(threading.Thread):
def __init__(self, name):
super().__init__()
self.name = name
self.bombs = None
self.activeBomb = None
self.bombID = 0
self.exploded = False
def addBomb(self, name, time, puzzle, answer, hidden=False):
self.bombs.append(Bomb(name, self.bombID, time, puzzle, answer, hidden))
self.activeBomb.append(self.bombID)
self.bombID += 1
def run(self):
for b in self.bombs:
b.start()
while True:
# listen to the bombs in the self.bombs # The part that I dont know how
# if one explodes
# print(self.name + ' has been destroyed')
# break
# if one is disarmed
# remove the bombID from the activeBomb
# if all bombs are disarmed (no activeBomb left)
# print('The city of ' + self.name + ' has been cleansed')
# break
class Bomb(threading.Thread):
def __init__(self, name, bombID, time, puzzle, answer, hidden=False):
super(Bomb, self).__init__()
self.name = name
self.bombID = bombID
self._timer = time
self._MAXTIME = time
self._disarmed = False
self._puzzle = puzzle
self._answer = answer
self._denoted = False
self._hidden = hidden
def run(self):
# A bomb goes off!!
if not self._hidden:
print('You have ' + str(self._MAXTIME)
+ ' seconds to solve the puzzle!')
print(self._puzzle)
while True:
if self._denoted:
print('BOOM')
// Communicate to city that bomb is denoted
break
elif not self._disarmed:
if self._timer == 0:
self._denoted = True
else:
self._timer -= 1
sleep(1)
else:
print('You have successfully disarmed bomb ' + str(self.name))
// Communicate to city that this bomb is disarmed
break
def answerPuzzle(self, ans):
print('Is answer ' + str(ans) + ' ?')
if ans == self._answer:
self._disarmed = True
else:
self._denotaed = True
def __eq__(self, bomb):
return self.bombID == bomb.bombID
def __hash__(self):
return id(self)
I currently don't know what is a good way for the City class to effectively keep track of the
bomb status.
The first thought I had was to use a for loop to have the City to check all the bombs in the
City, but I found it being too stupid and inefficient
So here is the question:
What is the most efficient way of implementing the bomb and City so that the city immediately know the state change of a bomb without having to check it every second?
PS: I do NOT mean to use this program to set off real bomb, so relax :D
A good case to use queue. Here is an example of the so-called producer - consumer pattern.
The work threads will run forever till your main program is done (that is what the daemon part and the "while True" is for). They will diligently monitor the in_queue for work packages. They will process the package until none is left. So when the in_queue is joined, your work threads' jobs are done. The out_queue here is an optional downstream processing step. So you can assemble the pieces from the work threads to a summary form. Useful when they are in a function.
If you need some outputs, like each work thread will print the results out to the screen or write to one single file, don't forget to use semaphore! Otherwise, your output will stumble onto each other.
Good luck!
from threading import Thread
import Queue
in_queue = Queue.Queue()
out_queue = Queue.Queue()
def work():
while True:
try:
sonId = in_queue.get()
###do your things here
result = sonID + 1
###you can even put your thread results again in another queue here
out_queue.put(result) ###optional
except:
pass
finally:
in_queue.task_done()
for i in range(20):
t = Thread(target=work)
t.daemon = True
t.start()
for son in range(10):
in_queue.put(son)
in_queue.join()
while not out_queue.empty():
result = out_queue.get()
###do something with your result here
out_queue.task_done()
out_queue.join()
The standard way of doing something like this is to use a queue - one thread watches the queue and waits for an object to handle (allowing it to idle happily), and the other thread pushes items onto the queue.
Python has the queue module (Queue in 2.x). Construct a queue in your listener thread and get() on it - this will block until something gets put on.
In your other thread, when a relevant event occurs, push it onto the queue and the listener thread will wake up and handle it. If you do this in a loop, you have the behaviour you want.
The easiest way would be to use a scheduler library. E.g. https://docs.python.org/2/library/sched.html. Using this you can simply schedule bombs to call a function or method at the time they go off. This is what I would recommend if you did not wanted to learn about threads.
E.g.
import sched
s = sched.scheduler(time.time, time.sleep)
class Bomb():
def explode(self):
if not self._disarmed:
print "BOOM"
def __init__(self, time):
s.enter(self._MAXTIME, 1, self.explode)
However, that way you will not learn about threads.
If you really want to use threads directly, then you can simply let the bombs call sleep until it is their time to go off. E.g.
class Bomb(threading.Thread)
def run(self):
time.sleep.(self._MAXTIME)
if not self._disarmed:
print "BOOM"
However, this is not a nice way to handle threads, since the threads will block your application. You will not be able to exit the application until you stop the threads. You can avoid this by making the thread a daemon thread. bomb.daemon = True.
In some cases, the best way to handle this is to actually "wake up" each second and check the status of the world. This may be the case when you need to perform some cleanup actions when the thread is stopped. E.g. You may need to close a file. Checking each second may seem wasteful, but it is actually the proper way to handle such problems. Modern desktop computers are mostly idle. To be interrupted for a few milliseconds each second will not cause them much sweat.
class Bomb(threading.Thread)
def run(self):
while not self._disarmed:
if time.now() > self.time_to_explode:
print "BOOM"
break
else:
time.sleep.(1)
Before you start "practising threading with python", I think it is important to understand Python threading model - it is Java threading model, but comes with a more restrictive option:
https://docs.python.org/2/library/threading.html
The design of this module is loosely based on Java’s threading model.
However, where Java makes locks and condition variables basic behavior
of every object, they are separate objects in Python. Python’s Thread
class supports a subset of the behavior of Java’s Thread class;
currently, there are no priorities, no thread groups, and threads
cannot be destroyed, stopped, suspended, resumed, or interrupted. The
static methods of Java’s Thread class, when implemented, are mapped to
module-level functions.
Locks being in separate objects, and not per-object, following the diagram below, means less independent scheduling even when different objects are accessed - because possibly even same locks are necessary.
For some python implementation - threading is not really fully concurrent:
http://uwpce-pythoncert.github.io/EMC-Python300-Spring2015/html_slides/07-threading-and-multiprocessing.html#slide-5
A thread is the entity within a process that can be scheduled for
execution
Threads are lightweight processes, run in the address space of an OS
process.
These threads share the memory and the state of the process. This
allows multiple threads access to data in the same scope.
Python threads are true OS level threads
Threads can not gain the performance advantage of multiple processors
due to the Global Interpreter Lock (GIL)
http://uwpce-pythoncert.github.io/EMC-Python300-Spring2015/html_slides/07-threading-and-multiprocessing.html#slide-6
And this (from above slide):

A couple of questions about calling methods outside of QThread from within the QThread - is my design flawed?

I have an application that has a GUI thread and many different worker threads. In this application, I have a functions.py module, which contains a lot of different "utility" functions that are used all over the application.
Yesterday the application has been released and some users (a minority, but still) has reported problems with the application crashing. I looked over my code and noticed a possible design flaw, and would like to check with the lovely people of SO and see if I am right and if this is indeed a flaw.
Suppose I have this defined in my functions.py module:
class Functions:
solveComputationSignal = Signal(str)
updateStatusSignal = Signal(int, str)
text = None
#classmethod
def setResultText(self, text):
self.text = text
#classmethod
def solveComputation(cls, platform, computation, param=None):
#Not the entirety of the method is listed here
result = urllib.urlopen(COMPUTATION_URL).read()
if param is None:
cls.solveComputationSignal.emit(result)
else:
cls.solveAlternateComputation(platform, computation)
while not self.text:
time.sleep(3)
return self.text if self.text else False
#classmethod
def updateCurrentStatus(cls, platform, statusText):
cls.updateStatusSignal.emit(platform, statusText)
I think these methods in themselves are fine. The two signals defined here are connected to in the GUI thread. The first signal pops-up a dialog in which the computation is presented. The GUI thread calls the setResultText() method and sets the resulting string as entered by the user (if anyone knows of a better way to wait until the user has inputted the text other than sleeping and waiting for self.text to become True, please let me know). The solveAlternateComputation is another method in the same class that solves the computation automatically, however, it too calls the setResultText() method that sets the resulting text.
The second signal updates the statusBar text of the main GUI as well.
What's worse is that I think the above design, while perhaps flawed, is not the problem.
The problem lies, I believe, in the way I call these methods, whihch is from the worker threads (note that I have multiple similar workers, all of which are different "platforms")
Assume I have this (and I do):
class WorkerPlatform1(QThread):
#Init and other methods are here
def run(self):
#Thread does its job here, but then when it needs to present the
#computation, instead of emitting a signal, this is what I do
self.f = functions.Functions
result = self.f.solveComputation(platform, computation)
if result:
#Go on with the task
else:
self.f.updateCurrentStatus(platform, "Error grabbing computation!")
In this case I think that my flaw is that the thread itself is not emitting any signals, but rather calling callables residing outside of that thread directly. Am I right in thinking that this could cause my application to crash? Although the faulty module is reported as QtGui4.dll
One more thing: both of these methods in the Functions class are accessed by many threads almost simultaneously. Is this even advisable - have methods residing outside of a thread be accessed by many threads all at the same time? Can it so happen that I "confuse" my program? The reason I am asking is because people who say that the application is not crashing report that, very often, the solveComputation() returns the incorrect text - not all the time, but very often. Since that COMPUTATION_URL's server can take some time to respond (even 10+ seconds), is it possible that, once a thread calls that method, while the urllib library is still waiting for server response, in that time another thread can call it, causing it to use a different COMPUTATION_URL, which will result in it returning an incorrect value on some cases?
Finally, I am thinking of solutions: for my first (crashing) problem, do you think the proper solution would be to directly emit a Signal from the thread itself, and then connect it in the GUI thread? Is that the right way to go about it?
Secondly, for the solveComputation returning incorrect values, would I solve it by moving that method (and accompanying methods) to every Worker class? then I could call them directly and hopefully have the correct response - or, dozens of different responses (since I have that many threads) - for every thread?
Thank you all and I apologize for the wall of text.
EDIT: I would like to add that when running in console with some users, this error appears QObject: Cannot create children for a parent that is in a different thread.
(Parent is QLabel(0x4795500), parent's thread is QThread(0x2d3fd90), current thread is WordpressCreator(0x49f0548)
Your design is flawed if you really are using your Functions class like this with classmethods storing results on class attributes, being shared amongst multiple workers. It should be using all instance methods, and each thread should be using an instance of this class:
class Functions(QObject):
solveComputationSignal = pyqtSignal(str)
updateStatusSignal = pyqtSignal(int, str)
def __init__(self, parent=None):
super(Functions, self).__init__(parent)
self.text = ""
def setResultText(self, text):
self.text = text
def solveComputation(self, platform, computation, param=None):
result = urllib.urlopen(COMPUTATION_URL).read()
if param is None:
self.solveComputationSignal.emit(result)
else:
self.solveAlternateComputation(platform, computation)
while not self.text:
time.sleep(3)
return self.text if self.text else False
def updateCurrentStatus(self, platform, statusText):
self.updateStatusSignal.emit(platform, statusText)
# worker_A
def run(self):
...
f = Functions()
# worker_B
def run(self):
...
f = Functions()
Also, for doing your urlopen, instead of doing sleeps to check for when it is ready, you can make use of the QNetworkAccessManager to make your requests and use signals to be notified when results are ready.

right way to run some code with timeout in Python

I looked online and found some SO discussing and ActiveState recipes for running some code with a timeout. It looks there are some common approaches:
Use thread that run the code, and join it with timeout. If timeout elapsed - kill the thread. This is not directly supported in Python (used private _Thread__stop function) so it is bad practice
Use signal.SIGALRM - but this approach not working on Windows!
Use subprocess with timeout - but this is too heavy - what if I want to start interruptible task often, I don't want fire process for each!
So, what is the right way? I'm not asking about workarounds (eg use Twisted and async IO), but actual way to solve actual problem - I have some function and I want to run it only with some timeout. If timeout elapsed, I want control back. And I want it to work on Linux and Windows.
A completely general solution to this really, honestly does not exist. You have to use the right solution for a given domain.
If you want timeouts for code you fully control, you have to write it to cooperate. Such code has to be able to break up into little chunks in some way, as in an event-driven system. You can also do this by threading if you can ensure nothing will hold a lock too long, but handling locks right is actually pretty hard.
If you want timeouts because you're afraid code is out of control (for example, if you're afraid the user will ask your calculator to compute 9**(9**9)), you need to run it in another process. This is the only easy way to sufficiently isolate it. Running it in your event system or even a different thread will not be enough. It is also possible to break things up into little chunks similar to the other solution, but requires very careful handling and usually isn't worth it; in any event, that doesn't allow you to do the same exact thing as just running the Python code.
What you might be looking for is the multiprocessing module. If subprocess is too heavy, then this may not suit your needs either.
import time
import multiprocessing
def do_this_other_thing_that_may_take_too_long(duration):
time.sleep(duration)
return 'done after sleeping {0} seconds.'.format(duration)
pool = multiprocessing.Pool(1)
print 'starting....'
res = pool.apply_async(do_this_other_thing_that_may_take_too_long, [8])
for timeout in range(1, 10):
try:
print '{0}: {1}'.format(duration, res.get(timeout))
except multiprocessing.TimeoutError:
print '{0}: timed out'.format(duration)
print 'end'
If it's network related you could try:
import socket
socket.setdefaulttimeout(number)
I found this with eventlet library:
http://eventlet.net/doc/modules/timeout.html
from eventlet.timeout import Timeout
timeout = Timeout(seconds, exception)
try:
... # execution here is limited by timeout
finally:
timeout.cancel()
For "normal" Python code, that doesn't linger prolongued times in C extensions or I/O waits, you can achieve your goal by setting a trace function with sys.settrace() that aborts the running code when the timeout is reached.
Whether that is sufficient or not depends on how co-operating or malicious the code you run is. If it's well-behaved, a tracing function is sufficient.
An other way is to use faulthandler:
import time
import faulthandler
faulthandler.enable()
try:
faulthandler.dump_tracebacks_later(3)
time.sleep(10)
finally:
faulthandler.cancel_dump_tracebacks_later()
N.B: The faulthandler module is part of stdlib in python3.3.
If you're running code that you expect to die after a set time, then you should write it properly so that there aren't any negative effects on shutdown, no matter if its a thread or a subprocess. A command pattern with undo would be useful here.
So, it really depends on what the thread is doing when you kill it. If its just crunching numbers who cares if you kill it. If its interacting with the filesystem and you kill it , then maybe you should really rethink your strategy.
What is supported in Python when it comes to threads? Daemon threads and joins. Why does python let the main thread exit if you've joined a daemon while its still active? Because its understood that someone using daemon threads will (hopefully) write the code in a way that it wont matter when that thread dies. Giving a timeout to a join and then letting main die, and thus taking any daemon threads with it, is perfectly acceptable in this context.
I've solved that in that way:
For me is worked great (in windows and not heavy at all) I'am hope it was useful for someone)
import threading
import time
class LongFunctionInside(object):
lock_state = threading.Lock()
working = False
def long_function(self, timeout):
self.working = True
timeout_work = threading.Thread(name="thread_name", target=self.work_time, args=(timeout,))
timeout_work.setDaemon(True)
timeout_work.start()
while True: # endless/long work
time.sleep(0.1) # in this rate the CPU is almost not used
if not self.working: # if state is working == true still working
break
self.set_state(True)
def work_time(self, sleep_time): # thread function that just sleeping specified time,
# in wake up it asking if function still working if it does set the secured variable work to false
time.sleep(sleep_time)
if self.working:
self.set_state(False)
def set_state(self, state): # secured state change
while True:
self.lock_state.acquire()
try:
self.working = state
break
finally:
self.lock_state.release()
lw = LongFunctionInside()
lw.long_function(10)
The main idea is to create a thread that will just sleep in parallel to "long work" and in wake up (after timeout) change the secured variable state, the long function checking the secured variable during its work.
I'm pretty new in Python programming, so if that solution has a fundamental errors, like resources, timing, deadlocks problems , please response)).
solving with the 'with' construct and merging solution from -
Timeout function if it takes too long to finish
this thread which work better.
import threading, time
class Exception_TIMEOUT(Exception):
pass
class linwintimeout:
def __init__(self, f, seconds=1.0, error_message='Timeout'):
self.seconds = seconds
self.thread = threading.Thread(target=f)
self.thread.daemon = True
self.error_message = error_message
def handle_timeout(self):
raise Exception_TIMEOUT(self.error_message)
def __enter__(self):
try:
self.thread.start()
self.thread.join(self.seconds)
except Exception, te:
raise te
def __exit__(self, type, value, traceback):
if self.thread.is_alive():
return self.handle_timeout()
def function():
while True:
print "keep printing ...", time.sleep(1)
try:
with linwintimeout(function, seconds=5.0, error_message='exceeded timeout of %s seconds' % 5.0):
pass
except Exception_TIMEOUT, e:
print " attention !! execeeded timeout, giving up ... %s " % e

Waiting on event with Twisted and PB

I have a python app that uses multiple threads and I am curious about the best way to wait for something in python without burning cpu or locking the GIL.
my app uses twisted and I spawn a thread to run a long operation so I do not stomp on the reactor thread. This long operation also spawns some threads using twisted's deferToThread to do something else, and the original thread wants to wait for the results from the defereds.
What I have been doing is this
while self._waiting:
time.sleep( 0.01 )
which seemed to disrupt twisted PB's objects from receiving messages so I thought sleep was locking the GIL. Further investigation by the posters below revealed however that it does not.
There are better ways to wait on threads without blocking the reactor thread or python posted below.
If you're already using Twisted, you should never need to "wait" like this.
As you've described it:
I spawn a thread to run a long operation ... This long operation also spawns some threads using twisted's deferToThread ...
That implies that you're calling deferToThread from your "long operation" thread, not from your main thread (the one where reactor.run() is running). As Jean-Paul Calderone already noted in a comment, you can only call Twisted APIs (such as deferToThread) from the main reactor thread.
The lock-up that you're seeing is a common symptom of not following this rule. It has nothing to do with the GIL, and everything to do with the fact that you have put Twisted's reactor into a broken state.
Based on your loose description of your program, I've tried to write a sample program that does what you're talking about based entirely on Twisted APIs, spawning all threads via Twisted and controlling them all from the main reactor thread.
import time
from twisted.internet import reactor
from twisted.internet.defer import gatherResults
from twisted.internet.threads import deferToThread, blockingCallFromThread
def workReallyHard():
"'Work' function, invoked in a thread."
time.sleep(0.2)
def longOperation():
for x in range(10):
workReallyHard()
blockingCallFromThread(reactor, startShortOperation, x)
result = blockingCallFromThread(reactor, gatherResults, shortOperations)
return 'hooray', result
def shortOperation(value):
workReallyHard()
return value * 100
shortOperations = []
def startShortOperation(value):
def done(result):
print 'Short operation complete!', result
return result
shortOperations.append(
deferToThread(shortOperation, value).addCallback(done))
d = deferToThread(longOperation)
def allDone(result):
print 'Long operation complete!', result
reactor.stop()
d.addCallback(allDone)
reactor.run()
Note that at the point in allDone where the reactor is stopped, you could fire off another "long operation" and have it start the process all over again.
Have you tried condition variables? They are used like
condition = Condition()
def consumer_in_thread_A():
condition.acquire()
try:
while resource_not_yet_available:
condition.wait()
# Here, the resource is available and may be
# consumed
finally:
condition.release()
def produce_in_thread_B():
# ... create resource, whatsoever
condition.acquire()
try:
condition.notify_all()
finally:
condition.release()
Condition variables act as locks (acquire and release), but their main purpose is to provide the control mechanism which allows to wait for them to be notify-d or notify_all-d.
I recently found out that calling
time.sleep( X ) will lock the GIL for
the entire time X and therefore freeze
ALL python threads for that time
period.
You found wrongly -- this is definitely not how it works. What's the source where you found this mis-information?
Anyway, then you clarify (in comments -- better edit your Q!) that you're using deferToThread and your problem with this is that...:
Well yes I defer the action to a
thread and give twisted a callback.
But the parent thread needs to wait
for the whole series of sub threads to
complete before it can move onto a new
set of sub threads to spawn
So use as the callback a method of an object with a counter -- start it at 0, increment it by one every time you're deferring-to-thread and decrement it by one in the callback method.
When the callback method sees that the decremented counter has gone back to 0, it knows that we're done waiting "for the whole series of sub threads to complete" and then the time has come to "move on to a new set of sub threads to spawn", and thus, in that case only, calls the "spawn a new set of sub threads" function or method -- it's that easy!
E.g. (net of typos &c as this is untested code, just to give you the idea)...:
class Waiter(object):
def __init__(self, what_next, *a, **k):
self.counter = 0
self.what_next = what_next
self.a = a
self.k = k
def one_more(self):
self.counter += 1
def do_wait(self, *dont_care):
self.counter -= 1
if self.counter == 0:
self.what_next(*self.a, **self.k)
def spawn_one_thread(waiter, long_calculation, *a, **k):
waiter.one_more()
d = threads.deferToThread(long_calculation, *a, **k)
d.addCallback(waiter.do_wait)
def spawn_all(waiter, list_of_lists_of_functions_args_and_kwds):
if not list_of_lists_of_functions_args_and_kwds:
return
if waiter is None:
waiter=Waiter(spawn_all, list_of_lists_of_functions_args_and_kwds)
this_time = list_of_list_of_functions_args_and_kwds.pop(0)
for f, a, k in this_time:
spawn_one_thread(waiter, f, *a, **k)
def start_it_all(list_of_lists_of_functions_args_and_kwds):
spawn_all(None, list_of_lists_of_functions_args_and_kwds)
According to the Python source, time.sleep() does not hold the GIL.
http://code.python.org/hg/trunk/file/98e56689c59c/Modules/timemodule.c#l920
Note the use of Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS, as documented here:
http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock
The threading module allows you to spawn a thread, which is then represented by a Thread object. That object has a join method that you can use to wait for the subthread to complete.
See http://docs.python.org/library/threading.html#module-threading

Is this Python code thread safe?

import time
import threading
class test(threading.Thread):
def __init__ (self):
threading.Thread.__init__(self)
self.doSkip = False
self.count = 0
def run(self):
while self.count<9:
self.work()
def skip(self):
self.doSkip = True
def work(self):
self.count+=1
time.sleep(1)
if(self.doSkip):
print "skipped"
self.doSkip = False
return
print self.count
t = test()
t.start()
while t.count<9:
time.sleep(2)
t.skip()
Thread-safe in which way? I don't see any part you might want to protect here.
skip may reset the doSkip at any time, so there's not much point in locking it. You don't have any resources that are accessed at the same time - so IMHO nothing can be corrupted / unsafe in this code.
The only part that might run differently depending on locking / counting is how many "skip"s do you expect on every call to .skip(). If you want to ensure that every skip results in a skipped call to .work(), you should change doSkip into a counter that is protected by a lock on both increment and compare/decrement. Currently one thread might turn doSkip on after the check, but before the doSkip reset. It doesn't matter in this example, but in some real situation (with more code) it might make a difference.
Whenever the test of a mutex boolean ( e.g. if(self.doSkip) ) is separate from the set of the mutex boolean you will probably have threading problems.
The rule is that your thread will get swapped out at the most inconvenient time. That is, after the test and before the set. Moving them closer together reduces the window for screw-ups but does not eliminate them. You almost always need a specially created mechanism from the language or kernel to fully close that window.
The threading library has Semaphores that can be used to synchronize threads and/or create critical sections of code.
Apparently there isn't any critical resource, so I'd say it's thread-safe.
But as usual you can't predict in which order the two threads will be blocked/run by the scheduler.
This is and will thread safe as long as you don't share data between threads.
If an other thread needs to read/write data to your thread class, then this won't be thread safe unless you protect data with some synchronization mechanism (like locks).
To elaborate on DanM's answer, conceivably this could happen:
Thread 1: t.skip()
Thread 2: if self.doSkip: print 'skipped'
Thread 1: t.skip()
Thread 2: self.doSkip = False
etc.
In other words, while you might expect to see one "skipped" for every call to t.skip(), this sequence of events would violate that.
However, because of your sleep() calls, I think this sequence of events is actually impossible.
(unless your computer is running really slowly)

Categories

Resources