I got this as an interview problem a few days ago. I don't really know parallel programming, and the obvious solution I've tried isn't working.
The question is: write two functions, one printing "foo", one printing "bar", that will be run on separate threads. How to ensure output is always:
foo
bar
foo
bar
...
Here's what I've tried:
from threading import Lock, Thread
class ThreadPrinting:
def __init__(self):
self.lock = Lock()
self.count = 10
def foo(self):
for _ in range(self.count):
with self.lock:
print("foo")
def bar(self):
for _ in range(self.count):
with self.lock:
print("bar")
if __name__ == "__main__":
tp = ThreadPrinting()
t1 = Thread(target=tp.foo)
t2 = Thread(target=tp.bar)
t1.start()
t2.start()
But this just produces 10 "foo"s and then 10 "bar"s. Seemingly the same thread manages to loop around and re-acquire the lock before the other. What might be the solution here? Thank you.
this just produces 10 "foo"s and then 10 "bar"s. Seemingly the same thread manages to loop around and re-acquire the lock before the other.
No surprise there. The problem with using a threading.Lock object (a.k.a., a "mutex") in this way is that, like the (default) mutexes in most programming systems, it makes no attempt to be fair.
The very next thing that either of your two threads does after it releases the lock is, it immediately tries to acquire the lock again. Meanwhile, the other thread is sleeping (a.k.a., "blocked",) waiting for its turn to acquire the lock.
The goal of most operating systems, when there is heavy demand for CPU time, is to maximize the amount of useful work that the CPU(s) can do. The best way to do that is to award the lock to the thread that already is running on some CPU instead of wasting time waking up some other thread that is sleeping.
That strategy works well in programs that use locks the way locks were meant to be used—that is to say, programs where the threads spend most of their time unlocked, and only briefly grab a lock, every so often, in order to examine or update some (group of) shared variables.
In order to make your threads take turns printing their messages, you are going to need to find some way to let the threads explicitly say to each other, "It's your turn now."
See my comments on your question for a hint about how you might do that.
#Solomon Slow provided a great explanation and pointed me in the right direction. I initially wanted some kind of a "lock with value" that can be acquired only conditionally. But this doesn't really exist, and busy-waiting in a cycle of "acquire lock - check variable - loop around" is not great. Instead I solved this with a pair of threading.Condition objects that the threads use to talk to each other. I'm sure there's a simpler solution, but here's mine:
from threading import Thread, Condition
class ThreadPrinting:
def __init__(self):
self.fooCondition = Condition()
self.barCondition = Condition()
self.count = 10
def foo(self):
for _ in range(self.count):
with self.fooCondition:
self.fooCondition.wait()
print("foo")
with self.barCondition:
self.barCondition.notify()
def bar(self):
with self.fooCondition:
self.fooCondition.notify() # Bootstrap the cycle
for _ in range(self.count):
with self.barCondition:
self.barCondition.wait()
print("bar")
with self.fooCondition:
self.fooCondition.notify()
if __name__ == "__main__":
tp = ThreadPrinting()
t1 = Thread(target=tp.foo)
t2 = Thread(target=tp.bar)
t1.start()
t2.start()
The way I did it was just to have the first thread send 'foo' then 1 second of sleep before the second sends 'bar'. Both functions sleep for 2 seconds between sends. This allows for them to always alternate, sending one word per second.
from threading import Thread
import time
def foo():
num = 0
while num < 10:
print("foo")
num = num + 1
time.sleep(2)
def bar():
num = 0
while num < 10:
print("bar")
num = num + 1
time.sleep(2)
t1 = Thread(target=foo)
t2 = Thread(target=bar)
t1.start()
time.sleep(1)
t2.start()
I tried this for 100 of each 'foo' and 'bar' and it still alternated.
Related
I have a _global_variable = Big_Giant_Class(). Big_Giant_Class takes a long time to run, but it also has constantly refreshing 'live-data' behind it, so I always want as new a instance of it as possible. Its not IO-bound, just a load of CPU computations.
Further, my program has a number of functions that reference that global instance of Big_Giant_Class.
I'm trying to figure out a way to create Big_Giant_Class in an endless loop (so I always have the latest-and-greatest!), but without it being blocking to all the other functions that reference _global_variable.
Conceptually, I kind of figure the code would look like:
import time
class Big_Giant_Class():
def __init__(self, val, sleep_me = False):
self.val = val
if sleep_me:
time.sleep(10)
def print_val(self):
print(self.val)
async def run_loop():
while True:
new_instance_value = await asyncio.run(Big_Giant_Class(val = 1)) # <-- takes a while
# somehow assign new_instance_value to _global_variable when its done!
def do_stuff_that_cant_be_blocked():
global _global_variable
return _global_variable.print_val()
_global_variable = Big_Giant_Class(val = 0)
if __name__ == "__main__":
asyncio.run(run_loop()) #<-- maybe I have to do this somewhere?
for i in range(20):
do_stuff_that_cant_be_blocked()
time.sleep(1)
Conceptual Out:
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
The kicker is, I have a number of functions [ie, do_stuff_that_cant_be_blocked] that can't be blocked.
I simply want them to use the last _global_variable value (which gets periodically updated by some unblocking...thing?). Thats why I figure I can't await the results, because that would block the other functions?
Is it possible to do something like that? I've done very little asyncio, so apologies if this is basic. I'm open to any other packages that might be able to do this (although I dont think Trio works, because I have incompatible required packages that are used)
Thanks for any help in advance!
So you have two cpu bound "loops" in your program. Python has a quirky threading model. So first off python CANNOT do two things at once, meaning calculations. Threading and async allow python to fake doing two things.
Threading allows you to "do" two things cause python switches between the threads and does work but doesnt run both at the same time
Async allows you to "do" two things if you can await the operation. While python awaits the operation it can jump and do something else. However awaiting a cpu bound operation will not allow it to jump and do other things.
The easiest solution is to use a thread though there will be some time where both loops are blocking cause work is being done on the other. But work will be split about 50/50 between threads.
from threading import Thread
_global_variable = some_initial_value
def update_global():
global _global_variable
while True:
_global_variable = get_new_global_instance()
call_some_event()
def main():
background_thread = Thread(target=update_global, daemon=True)
background_thread.start()
while True:
do_important_work()
The harder but truly multiprocessing version would be to use a Process instead of a Thread but would also need to use either shared state or a queue or something like that.
https://docs.python.org/3/library/multiprocessing.html#sharing-state-between-processes
I would like to know the progress of my processes. At the moment what I am using is not very effective. This is a mwe:
import time
from multiprocessing import Pool as ProcessPool
import progressbar
import random
def some_random_calculation(n):
with progressbar.ProgressBar(max_value=n) as bar:
for i in range(0,n):
time.sleep(1)
bar.update(i)
if __name__=='__main__':
arguments = [random.randint(4,10) for i in range(4)]
pool = ProcessPool(4)
results = pool.map_async(some_random_calculation, arguments)
print(results.get())
pool.close()
pool.join()
In this case, I am using progressbar2, however, the output is continuously updated on the same line when there is more than 1 process:
You see from the image that the bars are in sorted order just because after the first bar is ended a new one is created by other processes. When there are multiple processes a single bar is updated on the same line.
I am looking for a fix to my problem, it would be cool to have n bars dynamically updated. However, probably there is a smarter way to get a sense of the progress of different processes. Any advice?
So this is by far not perfect, the subject is pretty complex if you want to get everything right. But one thing is sure, you should monitor the progress from outside the subprocesses.
The fastest and probably the easiest way to do it would be to have a call-function that returns the status, and the governor outside can keep the user updated on the progress. That would look something like this:
import os, signal
from threading import Thread, enumerate as t_enumerate
from time import time, sleep
from random import random
clear = lambda: os.system('cls' if os.name=='nt' else 'clear')
def sig_handler(signal, frame):
for t in t_enumerate():
if t.getName() != 'MainThread':
t.stop()
exit(0)
signal.signal(signal.SIGINT, sig_handler)
class worker(Thread):
def __init__(self, init_value=0):
Thread.__init__(self)
self.init_value = init_value
self.progress = 0
self.run_state = True
self.start() # Start ourselves instead of from outside.
def poll(self):
return self.progress
def stop(self):
self.run_state = False
def run(self):
main_thread = None
for t in t_enumerate():
if t.getName() == 'MainThread':
main_thread = t
break
while main_thread and self.run_state and main_thread.isAlive():
for i in range(0, 100):
self.init_value *= i
self.progress = i
sleep(random())
break # Yea kinda unessecary while loop. meh..
workers = [worker(0) for i in range(4)]
while len(t_enumerate()) > 1:
clear()
for index, worker_handle in enumerate(workers):
progress = worker_handle.poll()
print(f'Thread {index} is at {progress}/100.')
sleep(1)
The other approach would be for each thread to acquire a lock on the thread pool before printing. But this adds complexity, for starters, they would all need to sync when it's time to print, so that they don't arbitrarily acquire the lock to print, but you're in some other part of the output process where something else is being printed. Or they would print in the wrong order, or you would need to keep track of which row you should backtrack to re-write..
There's probably going to be a Threading guru here with a better answer, but this is my two cents. Just add a poller function, do a combined status update and live with the very limited processing power it takes to call each thread. Unless you have thousands of them, you won't have any performance impact by calling multiple times.
I must be missing something here but this simple example of two threads trying to modify a global variable in a function is not giving the expected result:
from threading import Thread, Lock
some_var = 0
def some_func(id):
lo = Lock()
with lo:
global some_var
print("{} here!".format(id))
for i in range(1000000):
some_var += 1
print("{} leaving!".format(id))
t1 = Thread(target=some_func, args=(1,))
t2 = Thread(target=some_func, args=(2,))
t1.start()
t2.start()
t1.join()
t2.join()
print(some_var)
outputs:
1 here!
2 here!
2 leaving!
1 leaving!
1352010
As you can see both threads enter the part that should be locked simultaneous and the incrementation of the globel variable 'some_var' gets mixed up because of that.
It looks like the Lock is just not working for some reason.
For a range up to 10000 it is working but this is probably just because of the GIL not being released during such short calculations.
What is going on?
I'm using Python3.3.2 64bit
The Lock() function creates an entirely new lock - one that only the thread calling the function can use. That's why it doesn't work, because each thread is locking an entirely different lock.
Lock items are one of the few things that you can declare as a global without any problems, because you absolutely want every thread to see the same Lock(). You should try this instead:
from threading import Thread, Lock
some_var = 0
lo = Lock()
def some_func(id):
global lo
with lo:
global some_var
print("{} here!".format(id))
for i in range(1000000):
some_var += 1
print("{} leaving!".format(id))
Every time your function is getting called, a new lock is getting created hence you will have different locks for each different thread. The Lock object should be created globally because every thread should be able to see if the same lock is held up by another. Try moving you lock object creation as global lock!
Or you can define the lock in your main() function. And pass it to the called function.
lock = threading.Lock()
t1 = Thread(target=some_func, args=(1,lock))
t2 = Thread(target=some_func, args=(2,lock))
t1.start()
t2.start()
This way there is only one lock. It is better to avoid global variables whenever possible.
I'm trying to write a mini-game that allows me to practice my python threading skill. The game itself involves with timed bombs and citys that have them.
Here is my code:
class City(threading.Thread):
def __init__(self, name):
super().__init__()
self.name = name
self.bombs = None
self.activeBomb = None
self.bombID = 0
self.exploded = False
def addBomb(self, name, time, puzzle, answer, hidden=False):
self.bombs.append(Bomb(name, self.bombID, time, puzzle, answer, hidden))
self.activeBomb.append(self.bombID)
self.bombID += 1
def run(self):
for b in self.bombs:
b.start()
while True:
# listen to the bombs in the self.bombs # The part that I dont know how
# if one explodes
# print(self.name + ' has been destroyed')
# break
# if one is disarmed
# remove the bombID from the activeBomb
# if all bombs are disarmed (no activeBomb left)
# print('The city of ' + self.name + ' has been cleansed')
# break
class Bomb(threading.Thread):
def __init__(self, name, bombID, time, puzzle, answer, hidden=False):
super(Bomb, self).__init__()
self.name = name
self.bombID = bombID
self._timer = time
self._MAXTIME = time
self._disarmed = False
self._puzzle = puzzle
self._answer = answer
self._denoted = False
self._hidden = hidden
def run(self):
# A bomb goes off!!
if not self._hidden:
print('You have ' + str(self._MAXTIME)
+ ' seconds to solve the puzzle!')
print(self._puzzle)
while True:
if self._denoted:
print('BOOM')
// Communicate to city that bomb is denoted
break
elif not self._disarmed:
if self._timer == 0:
self._denoted = True
else:
self._timer -= 1
sleep(1)
else:
print('You have successfully disarmed bomb ' + str(self.name))
// Communicate to city that this bomb is disarmed
break
def answerPuzzle(self, ans):
print('Is answer ' + str(ans) + ' ?')
if ans == self._answer:
self._disarmed = True
else:
self._denotaed = True
def __eq__(self, bomb):
return self.bombID == bomb.bombID
def __hash__(self):
return id(self)
I currently don't know what is a good way for the City class to effectively keep track of the
bomb status.
The first thought I had was to use a for loop to have the City to check all the bombs in the
City, but I found it being too stupid and inefficient
So here is the question:
What is the most efficient way of implementing the bomb and City so that the city immediately know the state change of a bomb without having to check it every second?
PS: I do NOT mean to use this program to set off real bomb, so relax :D
A good case to use queue. Here is an example of the so-called producer - consumer pattern.
The work threads will run forever till your main program is done (that is what the daemon part and the "while True" is for). They will diligently monitor the in_queue for work packages. They will process the package until none is left. So when the in_queue is joined, your work threads' jobs are done. The out_queue here is an optional downstream processing step. So you can assemble the pieces from the work threads to a summary form. Useful when they are in a function.
If you need some outputs, like each work thread will print the results out to the screen or write to one single file, don't forget to use semaphore! Otherwise, your output will stumble onto each other.
Good luck!
from threading import Thread
import Queue
in_queue = Queue.Queue()
out_queue = Queue.Queue()
def work():
while True:
try:
sonId = in_queue.get()
###do your things here
result = sonID + 1
###you can even put your thread results again in another queue here
out_queue.put(result) ###optional
except:
pass
finally:
in_queue.task_done()
for i in range(20):
t = Thread(target=work)
t.daemon = True
t.start()
for son in range(10):
in_queue.put(son)
in_queue.join()
while not out_queue.empty():
result = out_queue.get()
###do something with your result here
out_queue.task_done()
out_queue.join()
The standard way of doing something like this is to use a queue - one thread watches the queue and waits for an object to handle (allowing it to idle happily), and the other thread pushes items onto the queue.
Python has the queue module (Queue in 2.x). Construct a queue in your listener thread and get() on it - this will block until something gets put on.
In your other thread, when a relevant event occurs, push it onto the queue and the listener thread will wake up and handle it. If you do this in a loop, you have the behaviour you want.
The easiest way would be to use a scheduler library. E.g. https://docs.python.org/2/library/sched.html. Using this you can simply schedule bombs to call a function or method at the time they go off. This is what I would recommend if you did not wanted to learn about threads.
E.g.
import sched
s = sched.scheduler(time.time, time.sleep)
class Bomb():
def explode(self):
if not self._disarmed:
print "BOOM"
def __init__(self, time):
s.enter(self._MAXTIME, 1, self.explode)
However, that way you will not learn about threads.
If you really want to use threads directly, then you can simply let the bombs call sleep until it is their time to go off. E.g.
class Bomb(threading.Thread)
def run(self):
time.sleep.(self._MAXTIME)
if not self._disarmed:
print "BOOM"
However, this is not a nice way to handle threads, since the threads will block your application. You will not be able to exit the application until you stop the threads. You can avoid this by making the thread a daemon thread. bomb.daemon = True.
In some cases, the best way to handle this is to actually "wake up" each second and check the status of the world. This may be the case when you need to perform some cleanup actions when the thread is stopped. E.g. You may need to close a file. Checking each second may seem wasteful, but it is actually the proper way to handle such problems. Modern desktop computers are mostly idle. To be interrupted for a few milliseconds each second will not cause them much sweat.
class Bomb(threading.Thread)
def run(self):
while not self._disarmed:
if time.now() > self.time_to_explode:
print "BOOM"
break
else:
time.sleep.(1)
Before you start "practising threading with python", I think it is important to understand Python threading model - it is Java threading model, but comes with a more restrictive option:
https://docs.python.org/2/library/threading.html
The design of this module is loosely based on Java’s threading model.
However, where Java makes locks and condition variables basic behavior
of every object, they are separate objects in Python. Python’s Thread
class supports a subset of the behavior of Java’s Thread class;
currently, there are no priorities, no thread groups, and threads
cannot be destroyed, stopped, suspended, resumed, or interrupted. The
static methods of Java’s Thread class, when implemented, are mapped to
module-level functions.
Locks being in separate objects, and not per-object, following the diagram below, means less independent scheduling even when different objects are accessed - because possibly even same locks are necessary.
For some python implementation - threading is not really fully concurrent:
http://uwpce-pythoncert.github.io/EMC-Python300-Spring2015/html_slides/07-threading-and-multiprocessing.html#slide-5
A thread is the entity within a process that can be scheduled for
execution
Threads are lightweight processes, run in the address space of an OS
process.
These threads share the memory and the state of the process. This
allows multiple threads access to data in the same scope.
Python threads are true OS level threads
Threads can not gain the performance advantage of multiple processors
due to the Global Interpreter Lock (GIL)
http://uwpce-pythoncert.github.io/EMC-Python300-Spring2015/html_slides/07-threading-and-multiprocessing.html#slide-6
And this (from above slide):
I have a "do..., until..." structure in Python as follows:
while True:
if foo() == bar():
break
It works fine (jumps out in the end) in most of the cases. However, in some of the cases where the condition is never met, it will get stuck there.
Figuring out what are these cases is kind of difficult, since it is essentially a random process behind. So I wish to set a "timeout" thing for the while loop.
Say, if the loop has been running for 1s, but still not yet stops, I wish the loop to terminate itself.
How may I do this?
Update: Here is the actual code:
while True:
possibleJunctions = junctionReachability[junctions.index(currentJunction)]
nextJunction = random.choice(filter(lambda (jx, jy): (jx - currentJunction[0]) * (endJunction[0] - currentJunction[0]) > 0 or (jy - currentJunction[1]) * (endJunction[1] - currentJunction[1]) > 0, possibleJunctions) or possibleJunctions)
if previousJunction != nextJunction: # never go back
junctionSequence.append(nextJunction)
previousJunction = currentJunction
currentJunction = nextJunction
if currentJunction == endJunction:
break
import time
loop_start = time.time()
while time.time() - loop_start <= 1:
if foo() == bar():
break
EDIT
Dan Doe's solution is simplest and best if your code is synchronous (just runs in a single thread) and you know that the foo and bar functions always terminate within some period of time.
If you have asynchronous code (like a GUI), or if the foo and bar functions you use to test for termination conditions can themselves take too long to complete, then read on.
Run the loop inside a separate thread/process. Run a timer in another process. Once the timer expires, set a flag that would cause the loop to terminate.
Something like this (warning: untested code):
import multiprocessing
import time
SECONDS = 10
event = multiprocessing.Event()
def worker():
"""Does stuff until work is complete, or until signaled to terminate by timer."""
while not event.is_set():
if foo() == bar():
break
def timer():
"""Signals the worker to terminate immediately."""
time.sleep(SECONDS)
event.set()
def main():
"""Kicks off subprocesses and waits for both of them to terminate."""
worker_process = multiprocessing.Process(target=worker)
timer_process = multiprocessing.Process(target=timer)
timer_process.start()
worker_process.start()
timer_process.join()
worker_process.join()
if __name__ == "__main__":
main()
If you were worried about the foo and bar functions taking too long to complete, you could explicitly terminate the worker process from within the timer process.
I recommend using a counter. This is a common trick to detect non-convergence.
maxiter = 10000
while True:
if stopCondition(): break
maxiter = maxiter - 1
if maxiter <= 0:
print >>sys.stderr, "Did not converge."
break
this requires the least overhead and usually adapts best to different CPUs: even on a faster CPU, you want the same termination behavior; instead of a time-based timeout.
However, it would be even better if you would detect being stuck e.g. with some criterion function that no longer improves.