Handling errors in python with multiple tasks - python

I have multiple tasks in my python workflow and would like to know what would the best way to handle errors.
class Task1():
is_ready = False
def run(self):
try:
a = 0/0
# some more operations
self._is_ready = True
except:
print 'logging errors'
class Task2():
_is_ready = False
def run(self):
try:
a = 1
# some more operations
self._is_ready = True
except:
print 'logging errors'
class Workflow():
def run(self, ):
self.task1 = Task1()
self.task2 = Task2()
self.task1.run()
if self.task1.is_ready:
self.task2.run()
w = Workflow()
w.run()
I basically want to run each tasks sequentially based on the errors of each tasks.
ie; if task1 runs fine then process task2..
Can you please let me know will be the above approach will be the right way?
I have totally 10 tasks and thinking adding multiple if loops does not sound like a great way..

There are really two questions here. One is about how to arrange the tasks in a sequence, the other is about how to break the sequence if one task fails.
If you want any kind of scalability, you will need an iterable of tasks, so that you can run a for loop over it. Using nested ifs is totally impractical as you yourself noticed. The basic structure will be conceptually something like this:
tasks = [Task1(), Task2(), ...]
for task in tasks:
task.run()
if task.failed():
break
None of the portions of the loop need to appear as written. The loop itself can be replaced with any, all or next. The status check can be an attribute check, a method call or even an implied exception.
You have a number of options for how to decide if a task failed:
Use an internal flag as you are currently doing. Make sure that the flag has a consistent name in all the task classes (notice the typo _is_ready in Task2). This is a bit of overkill unless you have a use-case that really requires it, since it provides redundant information, and not very elegantly at that.
Use a return value in run. This is much nicer because you can write
for task in tasks:
if not task.run():
break
Or alternatively (as #MichaelButscher cleverly suggested)
all(task.run() for task in tasks)
In either case, your task should look like this:
class Task1:
def run(self):
try:
# Some stuff
except SomeException:
# Log error
return False
return True
Just let the error propagate from the task implementation:
class Task1:
def run(self):
try:
# Some stuff
except SomeException:
# Log error
raise
I prefer this method to all the others because that's what exceptions are basically for in the first place. In this case, your loop will be even more minimalistic:
for task in tasks:
task.run()
Or alternatively, but more obscurely
any(task.run() for task in tasks)
Or even
from collections import deque
deque(task.run() for task in tasks, maxlen=0)
The second two options are really there only for reference purposes. If you go with exceptions, just write the basic for loop: it's plenty elegant enough and by far the least arcane.
Finally, I would recommend another fundamental change. If your tasks are truly arbitrary in nature, then you should consider allowing any callable taking no arguments to be a task. There is no particular need to restrict yourself to classes having a run method. If you need to have a task class, you can rename the method you call run to __call__, and all your instances will be callable with the () operator. The code would look conceptually like this then:
class CallableClass:
def __call__(self):
try:
# Do something
except:
# Log error
raise
def callable_function():
try:
# Do something
except:
# Log error
raise
for task in tasks:
task()

If the run() methods could return a boolean success value and each task should only be run if previous succeeded, then it could be done like:
class Workflow():
def run(self, ):
task_list = (Task1(), Task2(), Task3(), ...)
success = all(t.run() for t in task_list)

Related

Training a model based on time rather than epochs [duplicate]

In Python, for a toy example:
for x in range(0, 3):
# Call function A(x)
I want to continue the for loop if function A takes more than five seconds by skipping it so I won't get stuck or waste time.
By doing some search, I realized a subprocess or thread may help, but I have no idea how to implement it here.
I think creating a new process may be overkill. If you're on Mac or a Unix-based system, you should be able to use signal.SIGALRM to forcibly time out functions that take too long. This will work on functions that are idling for network or other issues that you absolutely can't handle by modifying your function. I have an example of using it in this answer:
Option for SSH to timeout after a short time? ClientAlive & ConnectTimeout don't seem to do what I need them to do
Editing my answer in here, though I'm not sure I'm supposed to do that:
import signal
class TimeoutException(Exception): # Custom exception class
pass
def timeout_handler(signum, frame): # Custom signal handler
raise TimeoutException
# Change the behavior of SIGALRM
signal.signal(signal.SIGALRM, timeout_handler)
for i in range(3):
# Start the timer. Once 5 seconds are over, a SIGALRM signal is sent.
signal.alarm(5)
# This try/except loop ensures that
# you'll catch TimeoutException when it's sent.
try:
A(i) # Whatever your function that might hang
except TimeoutException:
continue # continue the for loop if function A takes more than 5 second
else:
# Reset the alarm
signal.alarm(0)
This basically sets a timer for 5 seconds, then tries to execute your code. If it fails to complete before time runs out, a SIGALRM is sent, which we catch and turn into a TimeoutException. That forces you to the except block, where your program can continue.
Maybe someone find this decorator useful, based on TheSoundDefense answer:
import time
import signal
class TimeoutException(Exception): # Custom exception class
pass
def break_after(seconds=2):
def timeout_handler(signum, frame): # Custom signal handler
raise TimeoutException
def function(function):
def wrapper(*args, **kwargs):
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(seconds)
try:
res = function(*args, **kwargs)
signal.alarm(0) # Clear alarm
return res
except TimeoutException:
print u'Oops, timeout: %s sec reached.' % seconds, function.__name__, args, kwargs
return
return wrapper
return function
Test:
#break_after(3)
def test(a, b, c):
return time.sleep(10)
>>> test(1,2,3)
Oops, timeout: 3 sec reached. test (1, 2, 3) {}
If you can break your work up and check every so often, that's almost always the best solution. But sometimes that's not possible—e.g., maybe you're reading a file off an slow file share that every once in a while just hangs for 30 seconds. To deal with that internally, you'd have to restructure your whole program around an async I/O loop.
If you don't need to be cross-platform, you can use signals on *nix (including Mac and Linux), APCs on Windows, etc. But if you need to be cross-platform, that doesn't work.
So, if you really need to do it concurrently, you can, and sometimes you have to. In that case, you probably want to use a process for this, not a thread. You can't really kill a thread safely, but you can kill a process, and it can be as safe as you want it to be. Also, if the thread is taking 5+ seconds because it's CPU-bound, you don't want to fight with it over the GIL.
There are two basic options here.
First, you can put the code in another script and run it with subprocess:
subprocess.check_call([sys.executable, 'other_script.py', arg, other_arg],
timeout=5)
Since this is going through normal child-process channels, the only communication you can use is some argv strings, a success/failure return value (actually a small integer, but that's not much better), and optionally a hunk of text going in and a chunk of text coming out.
Alternatively, you can use multiprocessing to spawn a thread-like child process:
p = multiprocessing.Process(func, args)
p.start()
p.join(5)
if p.is_alive():
p.terminate()
As you can see, this is a little more complicated, but it's better in a few ways:
You can pass arbitrary Python objects (at least anything that can be pickled) rather than just strings.
Instead of having to put the target code in a completely independent script, you can leave it as a function in the same script.
It's more flexible—e.g., if you later need to, say, pass progress updates, it's very easy to add a queue in either or both directions.
The big problem with any kind of parallelism is sharing mutable data—e.g., having a background task update a global dictionary as part of its work (which your comments say you're trying to do). With threads, you can sort of get away with it, but race conditions can lead to corrupted data, so you have to be very careful with locking. With child processes, you can't get away with it at all. (Yes, you can use shared memory, as Sharing state between processes explains, but this is limited to simple types like numbers, fixed arrays, and types you know how to define as C structures, and it just gets you back to the same problems as threads.)
Ideally, you arrange things so you don't need to share any data while the process is running—you pass in a dict as a parameter and get a dict back as a result. This is usually pretty easy to arrange when you have a previously-synchronous function that you want to put in the background.
But what if, say, a partial result is better than no result? In that case, the simplest solution is to pass the results over a queue. You can do this with an explicit queue, as explained in Exchanging objects between processes, but there's an easier way.
If you can break the monolithic process into separate tasks, one for each value (or group of values) you wanted to stick in the dictionary, you can schedule them on a Pool—or, even better, a concurrent.futures.Executor. (If you're on Python 2.x or 3.1, see the backport futures on PyPI.)
Let's say your slow function looked like this:
def spam():
global d
for meat in get_all_meats():
count = get_meat_count(meat)
d.setdefault(meat, 0) += count
Instead, you'd do this:
def spam_one(meat):
count = get_meat_count(meat)
return meat, count
with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
results = executor.map(spam_one, get_canned_meats(), timeout=5)
for (meat, count) in results:
d.setdefault(meat, 0) += count
As many results as you get within 5 seconds get added to the dict; if that isn't all of them, the rest are abandoned, and a TimeoutError is raised (which you can handle however you want—log it, do some quick fallback code, whatever).
And if the tasks really are independent (as they are in my stupid little example, but of course they may not be in your real code, at least not without a major redesign), you can parallelize the work for free just by removing that max_workers=1. Then, if you run it on an 8-core machine, it'll kick off 8 workers and given them each 1/8th of the work to do, and things will get done faster. (Usually not 8x as fast, but often 3-6x as fast, which is still pretty nice.)
This seems like a better idea (sorry, I am not sure of the Python names of thing yet):
import signal
def signal_handler(signum, frame):
raise Exception("Timeout!")
signal.signal(signal.SIGALRM, signal_handler)
signal.alarm(3) # Three seconds
try:
for x in range(0, 3):
# Call function A(x)
except Exception, msg:
print "Timeout!"
signal.alarm(0) # Reset
The comments are correct in that you should check inside. Here is a potential solution. Note that an asynchronous function (by using a thread for example) is different from this solution. This is synchronous which means it will still run in series.
import time
for x in range(0,3):
someFunction()
def someFunction():
start = time.time()
while (time.time() - start < 5):
# do your normal function
return;

How to easily find a coroutine that has timed out?

key problem :asyncio.wait(aws,timeout=1,return_when=FIRST_COMPLETED) Is there a simple way to check if the returned task has timed out?
This is an extended question.
The scene is like this:
Total number of coroutines is unknown
server only allows 10 links
The server will return a seemingly correct result (eg returning an incorrect page)
The server sometimes does not return any data.
Maximum possible access to all data
So in order to get data faster, I need to limit the number of coroutines. Check the returned page. And timeout.
There are two simple methods at present.
1. similar to the thread, use queue to build a coroutine pool + 10 infinite loop coro. I don't really like it. In fact, this method works very fast.
2. I tried to use the high-level API of async python3.7, try to simplify the structure of the program, using while tasks & asyncio.wait & return_when.
Here I came across a problem with how to find timeouts for coroutines.
I built a simple demo:
import asyncio
async def test(delaytime):
print(f"begin {delaytime}")
await asyncio.sleep(delaytime )
print(f"finish {delaytime} ")
async def main():
# the number of tasks is unknow,range(10) is just a demo
allts = list(range(10))
ts = []
while len(ts)<5:
arg = allts.pop()
t = asyncio.create_task(test(arg))
t.arg = arg
ts.append(t)
while ts:
dones,pendings = await asyncio.wait(ts,timeout=2,return_when=asyncio.FIRST_COMPLETED)
for t in dones:
# if check t.result() is error , i can append ts again
print(t.arg,"is done")
ts.remove(t)
while len(ts)<5:
if len(allts):
arg = allts.pop()
t = asyncio.create_task(test(arg))
t.arg = arg
ts.append(t)
else:
break
# for t in pendings:
# # if can check t is timeout , i can append ts again
# pass
if __name__=="__main__":
asyncio.run(main())
After debugging, I know that return_when=asyncio.FIRST_COMPLETED, the tasks returned by asyncio.wait are in the pendings, except for the completed tasks.
However, I can't tell which task is timeout.
I thought about using wait_for, but wait_for has no return_when argument.
Is there a simple way to determine the timeout task in order to re-join ts?
The issue is that the approach of using wait(return_when=FIRST_COMPLETED) is fundamentally incompatible with the use of timeout. Since different tasks have started at different times, a single timeout argument obviously can't apply to all tasks. If you want to use return_when=FIRST_COMPLETED, wrap each task in asyncio.wait_for:
t = asyncio.create_task(asyncio.wait_for(test(arg), 2))
Then, when the task is done, you can use t.exception() to test if it has timed out, in which case it will return asyncio.TimeoutError. This check should only be performed among the done tasks.

Break the function after certain time

In Python, for a toy example:
for x in range(0, 3):
# Call function A(x)
I want to continue the for loop if function A takes more than five seconds by skipping it so I won't get stuck or waste time.
By doing some search, I realized a subprocess or thread may help, but I have no idea how to implement it here.
I think creating a new process may be overkill. If you're on Mac or a Unix-based system, you should be able to use signal.SIGALRM to forcibly time out functions that take too long. This will work on functions that are idling for network or other issues that you absolutely can't handle by modifying your function. I have an example of using it in this answer:
Option for SSH to timeout after a short time? ClientAlive & ConnectTimeout don't seem to do what I need them to do
Editing my answer in here, though I'm not sure I'm supposed to do that:
import signal
class TimeoutException(Exception): # Custom exception class
pass
def timeout_handler(signum, frame): # Custom signal handler
raise TimeoutException
# Change the behavior of SIGALRM
signal.signal(signal.SIGALRM, timeout_handler)
for i in range(3):
# Start the timer. Once 5 seconds are over, a SIGALRM signal is sent.
signal.alarm(5)
# This try/except loop ensures that
# you'll catch TimeoutException when it's sent.
try:
A(i) # Whatever your function that might hang
except TimeoutException:
continue # continue the for loop if function A takes more than 5 second
else:
# Reset the alarm
signal.alarm(0)
This basically sets a timer for 5 seconds, then tries to execute your code. If it fails to complete before time runs out, a SIGALRM is sent, which we catch and turn into a TimeoutException. That forces you to the except block, where your program can continue.
Maybe someone find this decorator useful, based on TheSoundDefense answer:
import time
import signal
class TimeoutException(Exception): # Custom exception class
pass
def break_after(seconds=2):
def timeout_handler(signum, frame): # Custom signal handler
raise TimeoutException
def function(function):
def wrapper(*args, **kwargs):
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(seconds)
try:
res = function(*args, **kwargs)
signal.alarm(0) # Clear alarm
return res
except TimeoutException:
print u'Oops, timeout: %s sec reached.' % seconds, function.__name__, args, kwargs
return
return wrapper
return function
Test:
#break_after(3)
def test(a, b, c):
return time.sleep(10)
>>> test(1,2,3)
Oops, timeout: 3 sec reached. test (1, 2, 3) {}
If you can break your work up and check every so often, that's almost always the best solution. But sometimes that's not possible—e.g., maybe you're reading a file off an slow file share that every once in a while just hangs for 30 seconds. To deal with that internally, you'd have to restructure your whole program around an async I/O loop.
If you don't need to be cross-platform, you can use signals on *nix (including Mac and Linux), APCs on Windows, etc. But if you need to be cross-platform, that doesn't work.
So, if you really need to do it concurrently, you can, and sometimes you have to. In that case, you probably want to use a process for this, not a thread. You can't really kill a thread safely, but you can kill a process, and it can be as safe as you want it to be. Also, if the thread is taking 5+ seconds because it's CPU-bound, you don't want to fight with it over the GIL.
There are two basic options here.
First, you can put the code in another script and run it with subprocess:
subprocess.check_call([sys.executable, 'other_script.py', arg, other_arg],
timeout=5)
Since this is going through normal child-process channels, the only communication you can use is some argv strings, a success/failure return value (actually a small integer, but that's not much better), and optionally a hunk of text going in and a chunk of text coming out.
Alternatively, you can use multiprocessing to spawn a thread-like child process:
p = multiprocessing.Process(func, args)
p.start()
p.join(5)
if p.is_alive():
p.terminate()
As you can see, this is a little more complicated, but it's better in a few ways:
You can pass arbitrary Python objects (at least anything that can be pickled) rather than just strings.
Instead of having to put the target code in a completely independent script, you can leave it as a function in the same script.
It's more flexible—e.g., if you later need to, say, pass progress updates, it's very easy to add a queue in either or both directions.
The big problem with any kind of parallelism is sharing mutable data—e.g., having a background task update a global dictionary as part of its work (which your comments say you're trying to do). With threads, you can sort of get away with it, but race conditions can lead to corrupted data, so you have to be very careful with locking. With child processes, you can't get away with it at all. (Yes, you can use shared memory, as Sharing state between processes explains, but this is limited to simple types like numbers, fixed arrays, and types you know how to define as C structures, and it just gets you back to the same problems as threads.)
Ideally, you arrange things so you don't need to share any data while the process is running—you pass in a dict as a parameter and get a dict back as a result. This is usually pretty easy to arrange when you have a previously-synchronous function that you want to put in the background.
But what if, say, a partial result is better than no result? In that case, the simplest solution is to pass the results over a queue. You can do this with an explicit queue, as explained in Exchanging objects between processes, but there's an easier way.
If you can break the monolithic process into separate tasks, one for each value (or group of values) you wanted to stick in the dictionary, you can schedule them on a Pool—or, even better, a concurrent.futures.Executor. (If you're on Python 2.x or 3.1, see the backport futures on PyPI.)
Let's say your slow function looked like this:
def spam():
global d
for meat in get_all_meats():
count = get_meat_count(meat)
d.setdefault(meat, 0) += count
Instead, you'd do this:
def spam_one(meat):
count = get_meat_count(meat)
return meat, count
with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
results = executor.map(spam_one, get_canned_meats(), timeout=5)
for (meat, count) in results:
d.setdefault(meat, 0) += count
As many results as you get within 5 seconds get added to the dict; if that isn't all of them, the rest are abandoned, and a TimeoutError is raised (which you can handle however you want—log it, do some quick fallback code, whatever).
And if the tasks really are independent (as they are in my stupid little example, but of course they may not be in your real code, at least not without a major redesign), you can parallelize the work for free just by removing that max_workers=1. Then, if you run it on an 8-core machine, it'll kick off 8 workers and given them each 1/8th of the work to do, and things will get done faster. (Usually not 8x as fast, but often 3-6x as fast, which is still pretty nice.)
This seems like a better idea (sorry, I am not sure of the Python names of thing yet):
import signal
def signal_handler(signum, frame):
raise Exception("Timeout!")
signal.signal(signal.SIGALRM, signal_handler)
signal.alarm(3) # Three seconds
try:
for x in range(0, 3):
# Call function A(x)
except Exception, msg:
print "Timeout!"
signal.alarm(0) # Reset
The comments are correct in that you should check inside. Here is a potential solution. Note that an asynchronous function (by using a thread for example) is different from this solution. This is synchronous which means it will still run in series.
import time
for x in range(0,3):
someFunction()
def someFunction():
start = time.time()
while (time.time() - start < 5):
# do your normal function
return;

How can I reproduce the race conditions in this python code reliably?

Context
I recently posted a timer class for review on Code Review. I'd had a gut feeling there were concurrency bugs as I'd once seen 1 unit test fail, but was unable to reproduce the failure. Hence my post to code review.
I got some great feedback highlighting various race conditions in the code. (I thought) I understood the problem and the solution, but before making any fixes, I wanted to expose the bugs with a unit test. When I tried, I realised it was difficult. Various stack exchange answers suggested I'd have to control the execution of threads to expose the bug(s) and any contrived timing would not necessarily be portable to a different machine. This seemed like a lot of accidental complexity beyond the problem I was trying to solve.
Instead I tried using the best static analysis (SA) tool for python, PyLint, to see if it'd pick out any of the bugs, but it couldn't. Why could a human find the bugs through code review (essentially SA), but a SA tool could not?
Afraid of trying to get Valgrind working with python (which sounded like yak-shaving), I decided to have a bash at fixing the bugs without reproducing them first. Now I'm in a pickle.
Here's the code now.
from threading import Timer, Lock
from time import time
class NotRunningError(Exception): pass
class AlreadyRunningError(Exception): pass
class KitchenTimer(object):
'''
Loosely models a clockwork kitchen timer with the following differences:
You can start the timer with arbitrary duration (e.g. 1.2 seconds).
The timer calls back a given function when time's up.
Querying the time remaining has 0.1 second accuracy.
'''
PRECISION_NUM_DECIMAL_PLACES = 1
RUNNING = "RUNNING"
STOPPED = "STOPPED"
TIMEUP = "TIMEUP"
def __init__(self):
self._stateLock = Lock()
with self._stateLock:
self._state = self.STOPPED
self._timeRemaining = 0
def start(self, duration=1, whenTimeup=None):
'''
Starts the timer to count down from the given duration and call whenTimeup when time's up.
'''
with self._stateLock:
if self.isRunning():
raise AlreadyRunningError
else:
self._state = self.RUNNING
self.duration = duration
self._userWhenTimeup = whenTimeup
self._startTime = time()
self._timer = Timer(duration, self._whenTimeup)
self._timer.start()
def stop(self):
'''
Stops the timer, preventing whenTimeup callback.
'''
with self._stateLock:
if self.isRunning():
self._timer.cancel()
self._state = self.STOPPED
self._timeRemaining = self.duration - self._elapsedTime()
else:
raise NotRunningError()
def isRunning(self):
return self._state == self.RUNNING
def isStopped(self):
return self._state == self.STOPPED
def isTimeup(self):
return self._state == self.TIMEUP
#property
def timeRemaining(self):
if self.isRunning():
self._timeRemaining = self.duration - self._elapsedTime()
return round(self._timeRemaining, self.PRECISION_NUM_DECIMAL_PLACES)
def _whenTimeup(self):
with self._stateLock:
self._state = self.TIMEUP
self._timeRemaining = 0
if callable(self._userWhenTimeup):
self._userWhenTimeup()
def _elapsedTime(self):
return time() - self._startTime
Question
In the context of this code example, how can I expose the race conditions, fix them, and prove they're fixed?
Extra points
extra points for a testing framework suitable for other implementations and problems rather than specifically to this code.
Takeaway
My takeaway is that the technical solution to reproduce the identified race conditions is to control the synchronism of two threads to ensure they execute in the order that will expose a bug. The important point here is that they are already identified race conditions. The best way I've found to identify race conditions is to put your code up for code review and encourage more expert people analyse it.
Traditionally, forcing race conditions in multithreaded code is done with semaphores, so you can force a thread to wait until another thread has achieved some edge condition before continuing.
For example, your object has some code to check that start is not called if the object is already running. You could force this condition to make sure it behaves as expected by doing something like this:
starting a KitchenTimer
having the timer block on a semaphore while in the running state
starting the same timer in another thread
catching AlreadyRunningError
To do some of this you may need to extend the KitchenTimer class. Formal unit tests will often use mock objects which are defined to block at critical times. Mock objects are a bigger topic than I can address here, but googling "python mock object" will turn up a lot of documentation and many implementations to choose from.
Here's a way that you could force your code to throw AlreadyRunningError:
import threading
class TestKitchenTimer(KitchenTimer):
_runningLock = threading.Condition()
def start(self, duration=1, whenTimeUp=None):
KitchenTimer.start(self, duration, whenTimeUp)
with self._runningLock:
print "waiting on _runningLock"
self._runningLock.wait()
def resume(self):
with self._runningLock:
self._runningLock.notify()
timer = TestKitchenTimer()
# Start the timer in a subthread. This thread will block as soon as
# it is started.
thread_1 = threading.Thread(target = timer.start, args = (10, None))
thread_1.start()
# Attempt to start the timer in a second thread, causing it to throw
# an AlreadyRunningError.
try:
thread_2 = threading.Thread(target = timer.start, args = (10, None))
thread_2.start()
except AlreadyRunningError:
print "AlreadyRunningError"
timer.resume()
timer.stop()
Reading through the code, identify some of the boundary conditions you want to test, then think about where you would need to pause the timer to force that condition to arise, and add Conditions, Semaphores, Events, etc. to make it happen. e.g. what happens if, just as the timer runs the whenTimeUp callback, another thread tries to stop it? You can force that condition by making the timer wait as soon as it's entered _whenTimeUp:
import threading
class TestKitchenTimer(KitchenTimer):
_runningLock = threading.Condition()
def _whenTimeup(self):
with self._runningLock:
self._runningLock.wait()
KitchenTimer._whenTimeup(self)
def resume(self):
with self._runningLock:
self._runningLock.notify()
def TimeupCallback():
print "TimeupCallback was called"
timer = TestKitchenTimer()
# The timer thread will block when the timer expires, but before the callback
# is invoked.
thread_1 = threading.Thread(target = timer.start, args = (1, TimeupCallback))
thread_1.start()
sleep(2)
# The timer is now blocked. In the parent thread, we stop it.
timer.stop()
print "timer is stopped: %r" % timer.isStopped()
# Now allow the countdown thread to resume.
timer.resume()
Subclassing the class you want to test isn't an awesome way to instrument it for testing: you'll have to override basically all of the methods in order to test race conditions in each one, and at that point there's a good argument to be made that you're not really testing the original code. Instead, you may find it cleaner to put the semaphores right in the KitchenTimer object but initialized to None by default, and have your methods check if testRunningLock is not None: before acquiring or waiting on the lock. Then you can force races on the actual code that you're submitting.
Some reading on Python mock frameworks that may be helpful. In fact, I'm not sure that mocks would be helpful in testing this code: it's almost entirely self-contained and doesn't rely on many external objects. But mock tutorials sometimes touch on issues like these. I haven't used any of these, but the documentation on these like a good place to get started:
Getting Started with Mock
Using Fudge
Python Mock Testing Techniques and Tools
The most common solution to testing thread (un)safe code is to start a lot of threads and hope for the best. The problem I, and I can imagine others, have with this is that it relies on chance and it makes tests 'heavy'.
As I ran into this a while ago I wanted to go for precision instead of brute force. The result is a piece of test code to cause race-conditions by letting the threads race neck to neck.
Sample racey code
spam = []
def set_spam():
spam[:] = foo()
use(spam)
If set_spam is called from several threads, a race condition exists between modification and use of spam. Let's try to reproduce it consistently.
How to cause race-conditions
class TriggeredThread(threading.Thread):
def __init__(self, sequence=None, *args, **kwargs):
self.sequence = sequence
self.lock = threading.Condition()
self.event = threading.Event()
threading.Thread.__init__(self, *args, **kwargs)
def __enter__(self):
self.lock.acquire()
while not self.event.is_set():
self.lock.wait()
self.event.clear()
def __exit__(self, *args):
self.lock.release()
if self.sequence:
next(self.sequence).trigger()
def trigger(self):
with self.lock:
self.event.set()
self.lock.notify()
Then to demonstrate the use of this thread:
spam = [] # Use a list to share values across threads.
results = [] # Register the results.
def set_spam():
thread = threading.current_thread()
with thread: # Acquires the lock.
# Set 'spam' to thread name
spam[:] = [thread.name]
# Thread 'releases' the lock upon exiting the context.
# The next thread is triggered and this thread waits for a trigger.
with thread:
# Since each thread overwrites the content of the 'spam'
# list, this should only result in True for the last thread.
results.append(spam == [thread.name])
threads = [
TriggeredThread(name='a', target=set_spam),
TriggeredThread(name='b', target=set_spam),
TriggeredThread(name='c', target=set_spam)]
# Create a shifted sequence of threads and share it among the threads.
thread_sequence = itertools.cycle(threads[1:] + threads[:1])
for thread in threads:
thread.sequence = thread_sequence
# Start each thread
[thread.start() for thread in threads]
# Trigger first thread.
# That thread will trigger the next thread, and so on.
threads[0].trigger()
# Wait for each thread to finish.
[thread.join() for thread in threads]
# The last thread 'has won the race' overwriting the value
# for 'spam', thus [False, False, True].
# If set_spam were thread-safe, all results would be true.
assert results == [False, False, True], "race condition triggered"
assert results == [True, True, True], "code is thread-safe"
I think I explained enough about this construction so you can implement it for your own situation. I think this fits the 'extra points' section quite nicely:
extra points for a testing framework suitable for other implementations and problems rather than specifically to this code.
Solving race-conditions
Shared variables
Each threading issue is solved in it's own specific way. In the example above I caused a race-condition by sharing a value across threads. Similar problems can occur when using global variables, such as a module attribute. The key to solving such issues may be to use a thread-local storage:
# The thread local storage is a global.
# This may seem weird at first, but it isn't actually shared among threads.
data = threading.local()
data.spam = [] # This list only exists in this thread.
results = [] # Results *are* shared though.
def set_spam():
thread = threading.current_thread()
# 'get' or set the 'spam' list. This actually creates a new list.
# If the list was shared among threads this would cause a race-condition.
data.spam = getattr(data, 'spam', [])
with thread:
data.spam[:] = [thread.name]
with thread:
results.append(data.spam == [thread.name])
# Start the threads as in the example above.
assert all(results) # All results should be True.
Concurrent reads/writes
A common threading issue is the problem of multiple threads reading and/or writing to a data holder concurrently. This problem is solved by implementing a read-write lock. The actual implementation of a read-write lock may differ. You may choose a read-first lock, a write-first lock or just at random.
I'm sure there are examples out there describing such locking techniques. I may write an example later as this is quite a long answer already. ;-)
Notes
Have a look at the threading module documentation and experiment with it a bit. As each threading issue is different, different solutions apply.
While on the subject of threading, have a look at the Python GIL (Global Interpreter Lock). It is important to note that threading may not actually be the best approach in optimizing performance (but this is not your goal). I found this presentation pretty good: https://www.youtube.com/watch?v=zEaosS1U5qY
You can test it by using a lot of threads:
import sys, random, thread
def timeup():
sys.stdout.write("Timer:: Up %f" % time())
def trdfunc(kt, tid):
while True :
sleep(1)
if not kt.isRunning():
if kt.start(1, timeup):
sys.stdout.write("[%d]: started\n" % tid)
else:
if random.random() < 0.1:
kt.stop()
sys.stdout.write("[%d]: stopped\n" % tid)
sys.stdout.write("[%d] remains %f\n" % ( tid, kt.timeRemaining))
kt = KitchenTimer()
kt.start(1, timeup)
for i in range(1, 100):
thread.start_new_thread ( trdfunc, (kt, i) )
trdfunc(kt, 0)
A couple of problem problems I see:
When a thread sees the timer as not running and try to start it, the
code generally raises an exception due to context switch in between
test and start. I think raising an exception is too much. Or you can
have an atomic testAndStart function
A similar problem occurs with stop. You can implement a testAndStop
function.
Even this code from the timeRemaining function:
if self.isRunning():
self._timeRemaining = self.duration - self._elapsedTime()
Needs some sort of atomicity, perhaps you need to grab a lock before
testing isRunning.
If you plan to share this class between threads, you need to address these issues.
In general - this is not viable solution. You can reproduce this race condition by using debugger (set breakpoints in some locations in the code, than, when it hits one of the breakpoints - freeze the thread and run the code until it hits another breakpoint, then freeze this thread and unfreeze the first thread, you can interleave threads execution in any way using this technique).
The problem is - the more threads and code you have, the more ways to interleave side effects they will have. Actually - it will grow exponentially. There is no viable solution to test it in general. It is possible only in some simple cases.
The solution to this problem are well known. Write code that is aware of it's side effects, control side effects with synchronisation primitives like locks, semaphores or queues or use immutable data if its possible.
Maybe more practical way is to use runtime checks to force correct call order. For example (pseudocode):
class RacyObject:
def __init__(self):
self.__cnt = 0
...
def isReadyAndLocked(self):
acquire_object_lock
if self.__cnt % 2 != 0:
# another thread is ready to start the Job
return False
if self.__is_ready:
self.__cnt += 1
return True
# Job is in progress or doesn't ready yet
return False
release_object_lock
def doJobAndRelease(self):
acquire_object_lock
if self.__cnt % 2 != 1:
raise RaceConditionDetected("Incorrect order")
self.__cnt += 1
do_job()
release_object_lock
This code will throw exception if you doesn't check isReadyAndLock before calling doJobAndRelease. This can be tested easily using only one thread.
obj = RacyObject()
...
# correct usage
if obj.isReadyAndLocked()
obj.doJobAndRelease()

simplifying threading in python

I am looking for a way to ease my threaded code.
There are a lot of places in my code where I do something like:
for arg in array:
t=Thread(lambda:myFunction(arg))
t.start()
i.e running the same function, each time for different parameters, in threads.
This is of course a simplified version of the real code, and usually the code inside the for loop is ~10-20 lines long, that cannot be made simple by using one auxiliary function like myFunction in the example above (had that been the case, I could've just used a thread pool).
Also, this scenario is very, very common in my code, so there are tons of lines which I consider redundant. It would help me a lot if I didn't need to handle all this boilerplate code, but instead be able to do something like:
for arg in array:
with threaded():
myFunction(arg)
i.e somehow threaded() takes every line of code inside it and runs it in a separate thread.
I know that context managers aren't supposed to be used in such situations, that it's probably a bad idea and will require an ugly hack, but nonetheless - can it be done, and how?
How about this:
for arg in array:
def _thread():
# code here
print arg
t = Thread(_thread)
t.start()
additionally, with decorators, you can sugar it up a little:
def spawn_thread(func):
t = Thread(func)
t.start()
return t
for arg in array:
#spawn_thread
def _thread():
# code here
print arg
Would a thread pool help you here? Many implementations for Python exist, for example this one.
P.S: still interested to know what your exact use-case is
What you want is a kind of "contextual thread pool".
Take a look at the ThreadPool class in this module, designed to be used similar to the manner you've given. Use would be something like this:
with ThreadPool() as pool:
for arg in array:
pool.add_thread(target=myFunction, args=[arg])
Failures in any task given to a ThreadPool will flag an error, and perform the standard error backtrace handling.
I think you're over-complicating it. This is the "pattern" I use:
# util.py
def start_thread(func, *args):
thread = threading.Thread(target=func, args=args)
thread.setDaemon(True)
thread.start()
return thread
# in another module
import util
...
for arg in array:
util.start_thread(myFunction, arg)
I don't see the big deal about having to create myFunction. You could even define the function inline with the function that starts it.
def do_stuff():
def thread_main(arg):
print "I'm a new thread with arg=%s" % arg
for arg in array:
util.start_thread(thread_main, arg)
If you're creating a large number of threads, a thread pool definitely makes more sense. You can easily make your own with the Queue and threading modules. Basically create a jobs queue, create N worker threads, give each thread a "pointer" to the queue and have them pull jobs from the queue and process them.

Categories

Resources