Python: Event-Handler for background task complete - python

I have a _global_variable = Big_Giant_Class(). Big_Giant_Class takes a long time to run, but it also has constantly refreshing 'live-data' behind it, so I always want as new a instance of it as possible. Its not IO-bound, just a load of CPU computations.
Further, my program has a number of functions that reference that global instance of Big_Giant_Class.
I'm trying to figure out a way to create Big_Giant_Class in an endless loop (so I always have the latest-and-greatest!), but without it being blocking to all the other functions that reference _global_variable.
Conceptually, I kind of figure the code would look like:
import time
class Big_Giant_Class():
def __init__(self, val, sleep_me = False):
self.val = val
if sleep_me:
time.sleep(10)
def print_val(self):
print(self.val)
async def run_loop():
while True:
new_instance_value = await asyncio.run(Big_Giant_Class(val = 1)) # <-- takes a while
# somehow assign new_instance_value to _global_variable when its done!
def do_stuff_that_cant_be_blocked():
global _global_variable
return _global_variable.print_val()
_global_variable = Big_Giant_Class(val = 0)
if __name__ == "__main__":
asyncio.run(run_loop()) #<-- maybe I have to do this somewhere?
for i in range(20):
do_stuff_that_cant_be_blocked()
time.sleep(1)
Conceptual Out:
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
The kicker is, I have a number of functions [ie, do_stuff_that_cant_be_blocked] that can't be blocked.
I simply want them to use the last _global_variable value (which gets periodically updated by some unblocking...thing?). Thats why I figure I can't await the results, because that would block the other functions?
Is it possible to do something like that? I've done very little asyncio, so apologies if this is basic. I'm open to any other packages that might be able to do this (although I dont think Trio works, because I have incompatible required packages that are used)
Thanks for any help in advance!

So you have two cpu bound "loops" in your program. Python has a quirky threading model. So first off python CANNOT do two things at once, meaning calculations. Threading and async allow python to fake doing two things.
Threading allows you to "do" two things cause python switches between the threads and does work but doesnt run both at the same time
Async allows you to "do" two things if you can await the operation. While python awaits the operation it can jump and do something else. However awaiting a cpu bound operation will not allow it to jump and do other things.
The easiest solution is to use a thread though there will be some time where both loops are blocking cause work is being done on the other. But work will be split about 50/50 between threads.
from threading import Thread
_global_variable = some_initial_value
def update_global():
global _global_variable
while True:
_global_variable = get_new_global_instance()
call_some_event()
def main():
background_thread = Thread(target=update_global, daemon=True)
background_thread.start()
while True:
do_important_work()
The harder but truly multiprocessing version would be to use a Process instead of a Thread but would also need to use either shared state or a queue or something like that.
https://docs.python.org/3/library/multiprocessing.html#sharing-state-between-processes

Related

Python Lock always re-acquired by the same thread

I got this as an interview problem a few days ago. I don't really know parallel programming, and the obvious solution I've tried isn't working.
The question is: write two functions, one printing "foo", one printing "bar", that will be run on separate threads. How to ensure output is always:
foo
bar
foo
bar
...
Here's what I've tried:
from threading import Lock, Thread
class ThreadPrinting:
def __init__(self):
self.lock = Lock()
self.count = 10
def foo(self):
for _ in range(self.count):
with self.lock:
print("foo")
def bar(self):
for _ in range(self.count):
with self.lock:
print("bar")
if __name__ == "__main__":
tp = ThreadPrinting()
t1 = Thread(target=tp.foo)
t2 = Thread(target=tp.bar)
t1.start()
t2.start()
But this just produces 10 "foo"s and then 10 "bar"s. Seemingly the same thread manages to loop around and re-acquire the lock before the other. What might be the solution here? Thank you.
this just produces 10 "foo"s and then 10 "bar"s. Seemingly the same thread manages to loop around and re-acquire the lock before the other.
No surprise there. The problem with using a threading.Lock object (a.k.a., a "mutex") in this way is that, like the (default) mutexes in most programming systems, it makes no attempt to be fair.
The very next thing that either of your two threads does after it releases the lock is, it immediately tries to acquire the lock again. Meanwhile, the other thread is sleeping (a.k.a., "blocked",) waiting for its turn to acquire the lock.
The goal of most operating systems, when there is heavy demand for CPU time, is to maximize the amount of useful work that the CPU(s) can do. The best way to do that is to award the lock to the thread that already is running on some CPU instead of wasting time waking up some other thread that is sleeping.
That strategy works well in programs that use locks the way locks were meant to be used—that is to say, programs where the threads spend most of their time unlocked, and only briefly grab a lock, every so often, in order to examine or update some (group of) shared variables.
In order to make your threads take turns printing their messages, you are going to need to find some way to let the threads explicitly say to each other, "It's your turn now."
See my comments on your question for a hint about how you might do that.
#Solomon Slow provided a great explanation and pointed me in the right direction. I initially wanted some kind of a "lock with value" that can be acquired only conditionally. But this doesn't really exist, and busy-waiting in a cycle of "acquire lock - check variable - loop around" is not great. Instead I solved this with a pair of threading.Condition objects that the threads use to talk to each other. I'm sure there's a simpler solution, but here's mine:
from threading import Thread, Condition
class ThreadPrinting:
def __init__(self):
self.fooCondition = Condition()
self.barCondition = Condition()
self.count = 10
def foo(self):
for _ in range(self.count):
with self.fooCondition:
self.fooCondition.wait()
print("foo")
with self.barCondition:
self.barCondition.notify()
def bar(self):
with self.fooCondition:
self.fooCondition.notify() # Bootstrap the cycle
for _ in range(self.count):
with self.barCondition:
self.barCondition.wait()
print("bar")
with self.fooCondition:
self.fooCondition.notify()
if __name__ == "__main__":
tp = ThreadPrinting()
t1 = Thread(target=tp.foo)
t2 = Thread(target=tp.bar)
t1.start()
t2.start()
The way I did it was just to have the first thread send 'foo' then 1 second of sleep before the second sends 'bar'. Both functions sleep for 2 seconds between sends. This allows for them to always alternate, sending one word per second.
from threading import Thread
import time
def foo():
num = 0
while num < 10:
print("foo")
num = num + 1
time.sleep(2)
def bar():
num = 0
while num < 10:
print("bar")
num = num + 1
time.sleep(2)
t1 = Thread(target=foo)
t2 = Thread(target=bar)
t1.start()
time.sleep(1)
t2.start()
I tried this for 100 of each 'foo' and 'bar' and it still alternated.

Python - Why doesn't multithreading increase the speed of my code?

I tried improving my code by running this with and without using two threads:
from threading import Lock
from threading import Thread
import time
start_time = time.clock()
arr_lock = Lock()
arr = range(5000)
def do_print():
# Disable arr access to other threads; they will have to wait if they need to read
a = 0
while True:
arr_lock.acquire()
if len(arr) > 0:
item = arr.pop(0)
print item
arr_lock.release()
b = 0
for a in range(30000):
b = b + 1
else:
arr_lock.release()
break
thread1 = Thread(target=do_print)
thread1.start()
thread1.join()
print time.clock() - start_time, "seconds"
When running 2 threads my code's run time increased. Does anyone know why this happened, or perhaps know a different way to increase the performance of my code?
The primary reason you aren't seeing any performance improvements with multiple threads is because your program only enables one thread to do anything useful at a time. The other thread is always blocked.
Two things:
Remove the print statement that's invoked inside the lock. print statements drastically impact performance and timing. Also, the I/O channel to stdout is essentially single threaded, so you've built another implicit lock into your code. So let's just remove the print statement.
Use a proper sleep technique instead of "spin locking" and counting up from 0 to 30000. That's just going to burn a core needlessly.
Try this as your main loop
while True:
arr_lock.acquire()
if len(arr) > 0:
item = arr.pop(0)
arr_lock.release()
time.sleep(0)
else:
arr_lock.release()
break
This should run slightly better... I would even advocate getting the sleep statement out altogether so you can just let each thread have a full quantum.
However, because each thread is either doing "nothing" (sleeping or blocked on acquire) or just doing a single pop call on the array while in the lock, the majority of the time spent is going to be in the acquire/release calls instead of actually operating on the array. Hence, multiple threads aren't going to make your program run faster.

Python threading design

I'm trying to write a mini-game that allows me to practice my python threading skill. The game itself involves with timed bombs and citys that have them.
Here is my code:
class City(threading.Thread):
def __init__(self, name):
super().__init__()
self.name = name
self.bombs = None
self.activeBomb = None
self.bombID = 0
self.exploded = False
def addBomb(self, name, time, puzzle, answer, hidden=False):
self.bombs.append(Bomb(name, self.bombID, time, puzzle, answer, hidden))
self.activeBomb.append(self.bombID)
self.bombID += 1
def run(self):
for b in self.bombs:
b.start()
while True:
# listen to the bombs in the self.bombs # The part that I dont know how
# if one explodes
# print(self.name + ' has been destroyed')
# break
# if one is disarmed
# remove the bombID from the activeBomb
# if all bombs are disarmed (no activeBomb left)
# print('The city of ' + self.name + ' has been cleansed')
# break
class Bomb(threading.Thread):
def __init__(self, name, bombID, time, puzzle, answer, hidden=False):
super(Bomb, self).__init__()
self.name = name
self.bombID = bombID
self._timer = time
self._MAXTIME = time
self._disarmed = False
self._puzzle = puzzle
self._answer = answer
self._denoted = False
self._hidden = hidden
def run(self):
# A bomb goes off!!
if not self._hidden:
print('You have ' + str(self._MAXTIME)
+ ' seconds to solve the puzzle!')
print(self._puzzle)
while True:
if self._denoted:
print('BOOM')
// Communicate to city that bomb is denoted
break
elif not self._disarmed:
if self._timer == 0:
self._denoted = True
else:
self._timer -= 1
sleep(1)
else:
print('You have successfully disarmed bomb ' + str(self.name))
// Communicate to city that this bomb is disarmed
break
def answerPuzzle(self, ans):
print('Is answer ' + str(ans) + ' ?')
if ans == self._answer:
self._disarmed = True
else:
self._denotaed = True
def __eq__(self, bomb):
return self.bombID == bomb.bombID
def __hash__(self):
return id(self)
I currently don't know what is a good way for the City class to effectively keep track of the
bomb status.
The first thought I had was to use a for loop to have the City to check all the bombs in the
City, but I found it being too stupid and inefficient
So here is the question:
What is the most efficient way of implementing the bomb and City so that the city immediately know the state change of a bomb without having to check it every second?
PS: I do NOT mean to use this program to set off real bomb, so relax :D
A good case to use queue. Here is an example of the so-called producer - consumer pattern.
The work threads will run forever till your main program is done (that is what the daemon part and the "while True" is for). They will diligently monitor the in_queue for work packages. They will process the package until none is left. So when the in_queue is joined, your work threads' jobs are done. The out_queue here is an optional downstream processing step. So you can assemble the pieces from the work threads to a summary form. Useful when they are in a function.
If you need some outputs, like each work thread will print the results out to the screen or write to one single file, don't forget to use semaphore! Otherwise, your output will stumble onto each other.
Good luck!
from threading import Thread
import Queue
in_queue = Queue.Queue()
out_queue = Queue.Queue()
def work():
while True:
try:
sonId = in_queue.get()
###do your things here
result = sonID + 1
###you can even put your thread results again in another queue here
out_queue.put(result) ###optional
except:
pass
finally:
in_queue.task_done()
for i in range(20):
t = Thread(target=work)
t.daemon = True
t.start()
for son in range(10):
in_queue.put(son)
in_queue.join()
while not out_queue.empty():
result = out_queue.get()
###do something with your result here
out_queue.task_done()
out_queue.join()
The standard way of doing something like this is to use a queue - one thread watches the queue and waits for an object to handle (allowing it to idle happily), and the other thread pushes items onto the queue.
Python has the queue module (Queue in 2.x). Construct a queue in your listener thread and get() on it - this will block until something gets put on.
In your other thread, when a relevant event occurs, push it onto the queue and the listener thread will wake up and handle it. If you do this in a loop, you have the behaviour you want.
The easiest way would be to use a scheduler library. E.g. https://docs.python.org/2/library/sched.html. Using this you can simply schedule bombs to call a function or method at the time they go off. This is what I would recommend if you did not wanted to learn about threads.
E.g.
import sched
s = sched.scheduler(time.time, time.sleep)
class Bomb():
def explode(self):
if not self._disarmed:
print "BOOM"
def __init__(self, time):
s.enter(self._MAXTIME, 1, self.explode)
However, that way you will not learn about threads.
If you really want to use threads directly, then you can simply let the bombs call sleep until it is their time to go off. E.g.
class Bomb(threading.Thread)
def run(self):
time.sleep.(self._MAXTIME)
if not self._disarmed:
print "BOOM"
However, this is not a nice way to handle threads, since the threads will block your application. You will not be able to exit the application until you stop the threads. You can avoid this by making the thread a daemon thread. bomb.daemon = True.
In some cases, the best way to handle this is to actually "wake up" each second and check the status of the world. This may be the case when you need to perform some cleanup actions when the thread is stopped. E.g. You may need to close a file. Checking each second may seem wasteful, but it is actually the proper way to handle such problems. Modern desktop computers are mostly idle. To be interrupted for a few milliseconds each second will not cause them much sweat.
class Bomb(threading.Thread)
def run(self):
while not self._disarmed:
if time.now() > self.time_to_explode:
print "BOOM"
break
else:
time.sleep.(1)
Before you start "practising threading with python", I think it is important to understand Python threading model - it is Java threading model, but comes with a more restrictive option:
https://docs.python.org/2/library/threading.html
The design of this module is loosely based on Java’s threading model.
However, where Java makes locks and condition variables basic behavior
of every object, they are separate objects in Python. Python’s Thread
class supports a subset of the behavior of Java’s Thread class;
currently, there are no priorities, no thread groups, and threads
cannot be destroyed, stopped, suspended, resumed, or interrupted. The
static methods of Java’s Thread class, when implemented, are mapped to
module-level functions.
Locks being in separate objects, and not per-object, following the diagram below, means less independent scheduling even when different objects are accessed - because possibly even same locks are necessary.
For some python implementation - threading is not really fully concurrent:
http://uwpce-pythoncert.github.io/EMC-Python300-Spring2015/html_slides/07-threading-and-multiprocessing.html#slide-5
A thread is the entity within a process that can be scheduled for
execution
Threads are lightweight processes, run in the address space of an OS
process.
These threads share the memory and the state of the process. This
allows multiple threads access to data in the same scope.
Python threads are true OS level threads
Threads can not gain the performance advantage of multiple processors
due to the Global Interpreter Lock (GIL)
http://uwpce-pythoncert.github.io/EMC-Python300-Spring2015/html_slides/07-threading-and-multiprocessing.html#slide-6
And this (from above slide):

How can I reproduce the race conditions in this python code reliably?

Context
I recently posted a timer class for review on Code Review. I'd had a gut feeling there were concurrency bugs as I'd once seen 1 unit test fail, but was unable to reproduce the failure. Hence my post to code review.
I got some great feedback highlighting various race conditions in the code. (I thought) I understood the problem and the solution, but before making any fixes, I wanted to expose the bugs with a unit test. When I tried, I realised it was difficult. Various stack exchange answers suggested I'd have to control the execution of threads to expose the bug(s) and any contrived timing would not necessarily be portable to a different machine. This seemed like a lot of accidental complexity beyond the problem I was trying to solve.
Instead I tried using the best static analysis (SA) tool for python, PyLint, to see if it'd pick out any of the bugs, but it couldn't. Why could a human find the bugs through code review (essentially SA), but a SA tool could not?
Afraid of trying to get Valgrind working with python (which sounded like yak-shaving), I decided to have a bash at fixing the bugs without reproducing them first. Now I'm in a pickle.
Here's the code now.
from threading import Timer, Lock
from time import time
class NotRunningError(Exception): pass
class AlreadyRunningError(Exception): pass
class KitchenTimer(object):
'''
Loosely models a clockwork kitchen timer with the following differences:
You can start the timer with arbitrary duration (e.g. 1.2 seconds).
The timer calls back a given function when time's up.
Querying the time remaining has 0.1 second accuracy.
'''
PRECISION_NUM_DECIMAL_PLACES = 1
RUNNING = "RUNNING"
STOPPED = "STOPPED"
TIMEUP = "TIMEUP"
def __init__(self):
self._stateLock = Lock()
with self._stateLock:
self._state = self.STOPPED
self._timeRemaining = 0
def start(self, duration=1, whenTimeup=None):
'''
Starts the timer to count down from the given duration and call whenTimeup when time's up.
'''
with self._stateLock:
if self.isRunning():
raise AlreadyRunningError
else:
self._state = self.RUNNING
self.duration = duration
self._userWhenTimeup = whenTimeup
self._startTime = time()
self._timer = Timer(duration, self._whenTimeup)
self._timer.start()
def stop(self):
'''
Stops the timer, preventing whenTimeup callback.
'''
with self._stateLock:
if self.isRunning():
self._timer.cancel()
self._state = self.STOPPED
self._timeRemaining = self.duration - self._elapsedTime()
else:
raise NotRunningError()
def isRunning(self):
return self._state == self.RUNNING
def isStopped(self):
return self._state == self.STOPPED
def isTimeup(self):
return self._state == self.TIMEUP
#property
def timeRemaining(self):
if self.isRunning():
self._timeRemaining = self.duration - self._elapsedTime()
return round(self._timeRemaining, self.PRECISION_NUM_DECIMAL_PLACES)
def _whenTimeup(self):
with self._stateLock:
self._state = self.TIMEUP
self._timeRemaining = 0
if callable(self._userWhenTimeup):
self._userWhenTimeup()
def _elapsedTime(self):
return time() - self._startTime
Question
In the context of this code example, how can I expose the race conditions, fix them, and prove they're fixed?
Extra points
extra points for a testing framework suitable for other implementations and problems rather than specifically to this code.
Takeaway
My takeaway is that the technical solution to reproduce the identified race conditions is to control the synchronism of two threads to ensure they execute in the order that will expose a bug. The important point here is that they are already identified race conditions. The best way I've found to identify race conditions is to put your code up for code review and encourage more expert people analyse it.
Traditionally, forcing race conditions in multithreaded code is done with semaphores, so you can force a thread to wait until another thread has achieved some edge condition before continuing.
For example, your object has some code to check that start is not called if the object is already running. You could force this condition to make sure it behaves as expected by doing something like this:
starting a KitchenTimer
having the timer block on a semaphore while in the running state
starting the same timer in another thread
catching AlreadyRunningError
To do some of this you may need to extend the KitchenTimer class. Formal unit tests will often use mock objects which are defined to block at critical times. Mock objects are a bigger topic than I can address here, but googling "python mock object" will turn up a lot of documentation and many implementations to choose from.
Here's a way that you could force your code to throw AlreadyRunningError:
import threading
class TestKitchenTimer(KitchenTimer):
_runningLock = threading.Condition()
def start(self, duration=1, whenTimeUp=None):
KitchenTimer.start(self, duration, whenTimeUp)
with self._runningLock:
print "waiting on _runningLock"
self._runningLock.wait()
def resume(self):
with self._runningLock:
self._runningLock.notify()
timer = TestKitchenTimer()
# Start the timer in a subthread. This thread will block as soon as
# it is started.
thread_1 = threading.Thread(target = timer.start, args = (10, None))
thread_1.start()
# Attempt to start the timer in a second thread, causing it to throw
# an AlreadyRunningError.
try:
thread_2 = threading.Thread(target = timer.start, args = (10, None))
thread_2.start()
except AlreadyRunningError:
print "AlreadyRunningError"
timer.resume()
timer.stop()
Reading through the code, identify some of the boundary conditions you want to test, then think about where you would need to pause the timer to force that condition to arise, and add Conditions, Semaphores, Events, etc. to make it happen. e.g. what happens if, just as the timer runs the whenTimeUp callback, another thread tries to stop it? You can force that condition by making the timer wait as soon as it's entered _whenTimeUp:
import threading
class TestKitchenTimer(KitchenTimer):
_runningLock = threading.Condition()
def _whenTimeup(self):
with self._runningLock:
self._runningLock.wait()
KitchenTimer._whenTimeup(self)
def resume(self):
with self._runningLock:
self._runningLock.notify()
def TimeupCallback():
print "TimeupCallback was called"
timer = TestKitchenTimer()
# The timer thread will block when the timer expires, but before the callback
# is invoked.
thread_1 = threading.Thread(target = timer.start, args = (1, TimeupCallback))
thread_1.start()
sleep(2)
# The timer is now blocked. In the parent thread, we stop it.
timer.stop()
print "timer is stopped: %r" % timer.isStopped()
# Now allow the countdown thread to resume.
timer.resume()
Subclassing the class you want to test isn't an awesome way to instrument it for testing: you'll have to override basically all of the methods in order to test race conditions in each one, and at that point there's a good argument to be made that you're not really testing the original code. Instead, you may find it cleaner to put the semaphores right in the KitchenTimer object but initialized to None by default, and have your methods check if testRunningLock is not None: before acquiring or waiting on the lock. Then you can force races on the actual code that you're submitting.
Some reading on Python mock frameworks that may be helpful. In fact, I'm not sure that mocks would be helpful in testing this code: it's almost entirely self-contained and doesn't rely on many external objects. But mock tutorials sometimes touch on issues like these. I haven't used any of these, but the documentation on these like a good place to get started:
Getting Started with Mock
Using Fudge
Python Mock Testing Techniques and Tools
The most common solution to testing thread (un)safe code is to start a lot of threads and hope for the best. The problem I, and I can imagine others, have with this is that it relies on chance and it makes tests 'heavy'.
As I ran into this a while ago I wanted to go for precision instead of brute force. The result is a piece of test code to cause race-conditions by letting the threads race neck to neck.
Sample racey code
spam = []
def set_spam():
spam[:] = foo()
use(spam)
If set_spam is called from several threads, a race condition exists between modification and use of spam. Let's try to reproduce it consistently.
How to cause race-conditions
class TriggeredThread(threading.Thread):
def __init__(self, sequence=None, *args, **kwargs):
self.sequence = sequence
self.lock = threading.Condition()
self.event = threading.Event()
threading.Thread.__init__(self, *args, **kwargs)
def __enter__(self):
self.lock.acquire()
while not self.event.is_set():
self.lock.wait()
self.event.clear()
def __exit__(self, *args):
self.lock.release()
if self.sequence:
next(self.sequence).trigger()
def trigger(self):
with self.lock:
self.event.set()
self.lock.notify()
Then to demonstrate the use of this thread:
spam = [] # Use a list to share values across threads.
results = [] # Register the results.
def set_spam():
thread = threading.current_thread()
with thread: # Acquires the lock.
# Set 'spam' to thread name
spam[:] = [thread.name]
# Thread 'releases' the lock upon exiting the context.
# The next thread is triggered and this thread waits for a trigger.
with thread:
# Since each thread overwrites the content of the 'spam'
# list, this should only result in True for the last thread.
results.append(spam == [thread.name])
threads = [
TriggeredThread(name='a', target=set_spam),
TriggeredThread(name='b', target=set_spam),
TriggeredThread(name='c', target=set_spam)]
# Create a shifted sequence of threads and share it among the threads.
thread_sequence = itertools.cycle(threads[1:] + threads[:1])
for thread in threads:
thread.sequence = thread_sequence
# Start each thread
[thread.start() for thread in threads]
# Trigger first thread.
# That thread will trigger the next thread, and so on.
threads[0].trigger()
# Wait for each thread to finish.
[thread.join() for thread in threads]
# The last thread 'has won the race' overwriting the value
# for 'spam', thus [False, False, True].
# If set_spam were thread-safe, all results would be true.
assert results == [False, False, True], "race condition triggered"
assert results == [True, True, True], "code is thread-safe"
I think I explained enough about this construction so you can implement it for your own situation. I think this fits the 'extra points' section quite nicely:
extra points for a testing framework suitable for other implementations and problems rather than specifically to this code.
Solving race-conditions
Shared variables
Each threading issue is solved in it's own specific way. In the example above I caused a race-condition by sharing a value across threads. Similar problems can occur when using global variables, such as a module attribute. The key to solving such issues may be to use a thread-local storage:
# The thread local storage is a global.
# This may seem weird at first, but it isn't actually shared among threads.
data = threading.local()
data.spam = [] # This list only exists in this thread.
results = [] # Results *are* shared though.
def set_spam():
thread = threading.current_thread()
# 'get' or set the 'spam' list. This actually creates a new list.
# If the list was shared among threads this would cause a race-condition.
data.spam = getattr(data, 'spam', [])
with thread:
data.spam[:] = [thread.name]
with thread:
results.append(data.spam == [thread.name])
# Start the threads as in the example above.
assert all(results) # All results should be True.
Concurrent reads/writes
A common threading issue is the problem of multiple threads reading and/or writing to a data holder concurrently. This problem is solved by implementing a read-write lock. The actual implementation of a read-write lock may differ. You may choose a read-first lock, a write-first lock or just at random.
I'm sure there are examples out there describing such locking techniques. I may write an example later as this is quite a long answer already. ;-)
Notes
Have a look at the threading module documentation and experiment with it a bit. As each threading issue is different, different solutions apply.
While on the subject of threading, have a look at the Python GIL (Global Interpreter Lock). It is important to note that threading may not actually be the best approach in optimizing performance (but this is not your goal). I found this presentation pretty good: https://www.youtube.com/watch?v=zEaosS1U5qY
You can test it by using a lot of threads:
import sys, random, thread
def timeup():
sys.stdout.write("Timer:: Up %f" % time())
def trdfunc(kt, tid):
while True :
sleep(1)
if not kt.isRunning():
if kt.start(1, timeup):
sys.stdout.write("[%d]: started\n" % tid)
else:
if random.random() < 0.1:
kt.stop()
sys.stdout.write("[%d]: stopped\n" % tid)
sys.stdout.write("[%d] remains %f\n" % ( tid, kt.timeRemaining))
kt = KitchenTimer()
kt.start(1, timeup)
for i in range(1, 100):
thread.start_new_thread ( trdfunc, (kt, i) )
trdfunc(kt, 0)
A couple of problem problems I see:
When a thread sees the timer as not running and try to start it, the
code generally raises an exception due to context switch in between
test and start. I think raising an exception is too much. Or you can
have an atomic testAndStart function
A similar problem occurs with stop. You can implement a testAndStop
function.
Even this code from the timeRemaining function:
if self.isRunning():
self._timeRemaining = self.duration - self._elapsedTime()
Needs some sort of atomicity, perhaps you need to grab a lock before
testing isRunning.
If you plan to share this class between threads, you need to address these issues.
In general - this is not viable solution. You can reproduce this race condition by using debugger (set breakpoints in some locations in the code, than, when it hits one of the breakpoints - freeze the thread and run the code until it hits another breakpoint, then freeze this thread and unfreeze the first thread, you can interleave threads execution in any way using this technique).
The problem is - the more threads and code you have, the more ways to interleave side effects they will have. Actually - it will grow exponentially. There is no viable solution to test it in general. It is possible only in some simple cases.
The solution to this problem are well known. Write code that is aware of it's side effects, control side effects with synchronisation primitives like locks, semaphores or queues or use immutable data if its possible.
Maybe more practical way is to use runtime checks to force correct call order. For example (pseudocode):
class RacyObject:
def __init__(self):
self.__cnt = 0
...
def isReadyAndLocked(self):
acquire_object_lock
if self.__cnt % 2 != 0:
# another thread is ready to start the Job
return False
if self.__is_ready:
self.__cnt += 1
return True
# Job is in progress or doesn't ready yet
return False
release_object_lock
def doJobAndRelease(self):
acquire_object_lock
if self.__cnt % 2 != 1:
raise RaceConditionDetected("Incorrect order")
self.__cnt += 1
do_job()
release_object_lock
This code will throw exception if you doesn't check isReadyAndLock before calling doJobAndRelease. This can be tested easily using only one thread.
obj = RacyObject()
...
# correct usage
if obj.isReadyAndLocked()
obj.doJobAndRelease()

define global variables in a function and pass on to a class

Can any one help please,
I am trying to sample data every x minute from the class PROCESS defined in the code below (which runs whenever called on by other functions not shown here)
to schedule this, I am running a scheduler function every X minute, started by the MAIN function and executed by function minmax_job.
however my function minmax_job doesn't seem to know the initial value of i
I have tried on and on about putting global variables and so on, but it still doesn't know that i = 0 (initially)
i = 0
atc,otc,tssc = 0,0,0
atf,otf,tssf = False,False,False
class Process(threading.Thread):
def __init__(self, buffer3, broadcast_server):
threading.Thread.__init__(self)
self.setDaemon(True)
self.buffer3 = buffer3
self.factory = broadcast_server
def run(self):
today = datetime.now()
global time_of_last_run
global atv1,atv2,atv3,otv1,otv2,otv3,tssv1,tssv2,tssv3
global atf,otf,tssf
global atc,otc,tssc
if self.buffer3.startswith('kitchen aquarium: temp:'):
self.temp = self.buffer3.replace('kitchen aquarium: temp:','')
self.factory.broadcast("Aquarium temperature %s" % self.temp)
if atc == 1 and atf:
atv1 = float(self.temp)
atf = False
elif atc == 2 and atf:
atv2 = float(self.temp)
atf = False
elif atc == 3 and atf:
atv3 = float(self.temp)
atf = False
def minmax_job():
global atv1,atv2,atv3,otv1,otv2,otv3,tssv1,tssv2,tssv3
global atf,otf,tssf
global atc,otc,tssc,i
if i == 3:
i = 0
atc = 0
if i < 4:
atc = atc + 1
atf = True
i = i + 1
if __name__ == '__main__':
minmax_scheduler = Scheduler()
minmax_scheduler.add_interval_job(minmax_job, seconds=10)
minmax_scheduler.start()
i needs to be declared as global, then assigned a value at the outer scope as this demo shows
global i
i=0
def rabbit():
global i
print "rabbit ",i
#main here
if __name__ == '__main__':
rabbit()
#Pillmuncher, above, had the right idea as a sub-comment. I'll make it a complete answer. You're trying to share data between processes and/or threads by means of global variables. This is a very INCORRECT method. It might work or might not, some of the time, if the phase of the moon is right and you don't change where you put a comment, etc.
The reasons for this are complex, but suffice to say you should find a Comp. Sci. textbook on operating systems and look up the following terms:
forking
thread safety
multiprocessing
protected memory
race condition
message queues
interprocess communication
thread locks
stack memory allocation
heap memory
In Python, both the multiprocessing and threading modules provides many functions for declaring shared memory and keeping it safe. You like safe (even if you don't know it yet). Safe is good. Safe can be (but isn't always) fast.
If you try to use global variables instead of features from multiprocessing and threading modules, you'll shoot yourself in the foot while hanging yourself from a yardarm: slowly and painfully and you'll disdain life itself.
So, check out: http://docs.python.org/2/library/multiprocessing.html
This has many fine examples of doing things the right way. Declare your vars ahead of time, pass them into each thread/process, and live the upstanding life you really want to lead.
Generally speaking, prefer multiprocessing. Multithreading is rife with trouble, and while it can be faster to execute, you almost never need that speed. That speed comes with danger, and hassle, lots of debugging time (now and later), and being very, very, careful. Better to do what the old-timers (like me) do: make multiprocessing your friend and dump threading in the [primarily Microsoft-centric] trash-heap of don't-go-there.

Categories

Resources