I'm learning about concurrent programming with Python.
In the following code, I seem to be having synchronizing issues. How can I fix it?
import threading
N = 1000000
counter = 0
def increment():
global counter
for i in range(N):
counter += 1
t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)
t1.start()
t2.start()
t1.join()
t2.join()
print(counter)
Both threads are trying to modify counter at the same time, and sometimes they do. That results in some of the increments not appearing. Here is a simple minded approach to solve that problem using threading.Lock:
import threading
N = 1000000
counter = 0
def increment(theLock):
global counter
for i in range(N):
theLock.acquire()
counter += 1
theLock.release()
lock = threading.Lock()
t1 = threading.Thread(target=increment, args=[lock,])
t2 = threading.Thread(target=increment, args=[lock,])
t1.start()
t2.start()
t1.join()
t2.join()
print(counter)
The theLock.acquire() and theLock.release() surround code that must be protected to only run in one thread at a time. In your example the acquire and release could also surround the entire loop, but that would be the same as not using multiprocessing. See the threading documentation and in particular, the Lock Objects section.
Related
I'd always assumed a threading.Lock object would act as a mutex to prevent race conditions from occuring in a multithreading Python script, but I'm finding that either
my assumption is false (contradicting years of experience), or
Python itself has a bug (in, at the very least, versions 2.7-3.9) regarding this.
Theoretically, incrementing a value shared between two threads should be fine as long as you protect the critical section (ie the code incrementing that value) with a Lock, ie. mutex.
Running this code, I find mutexes in Python not to work as expected. Can anyone enlighten me on this?
#!/usr/bin/env python
from __future__ import print_function
import sys
import threading
import time
Stop = False
class T(threading.Thread):
def __init__(self,list_with_int):
self.mycount = 0
self.to_increment = list_with_int
super(T,self).__init__()
def run(self,):
while not Stop:
with threading.Lock():
self.to_increment[0] += 1
self.mycount += 1
intList = [0]
t1 = T(intList)
t2 = T(intList)
t1.start()
t2.start()
Delay = float(sys.argv[1]) if sys.argv[1:] else 3.0
time.sleep(Delay)
Stop = True
t1.join()
t2.join()
total_internal_counts = t1.mycount + t2.mycount
print("Compare:\n\t{total_internal_counts}\n\t{intList[0]}\n".format(**locals()))
assert total_internal_counts == intList[0]
It's possible that this is the answer, the lock object must be persistent and shared amongst the threads.
This works :
#!/usr/bin/env python
from __future__ import print_function
import sys
import threading
import time
Stop = False
class T(threading.Thread):
lock = threading.Lock()
def __init__(self,list_with_int):
self.mycount = 0
self.to_increment = list_with_int
super(T,self).__init__()
def run(self,):
while not Stop:
with self.lock:
self.to_increment[0] += 1
self.mycount += 1
intList = [0]
t1 = T(intList)
t2 = T(intList)
t1.start()
t2.start()
Delay = float(sys.argv[1]) if sys.argv[1:] else 3.0
time.sleep(Delay)
Stop = True
t1.join()
t2.join()
total_internal_counts = t1.mycount + t2.mycount
print("Compare:\n\t{total_internal_counts}\n\t{intList[0]}\n".format(**locals()))
It's possible that...the lock object must be persistent and shared amongst the threads
That's exactly right. threading.Lock() is a constructor call. The run loop in your original example created a new Lock object for each iteration:
while not Stop:
with threading.Lock(): //creates a new Lock object each time 'round
self.to_increment[0] += 1
You fixed it by creating a single Lock object that is used by all the threads and, by every iteration of the loop in each thread.
class T(threading.Thread):
// create one Lock object that will be shared by all instances of
// class T
//
lock = threading.Lock()
def run(self,):
while not Stop:
with self.lock: // lock and release the one Lock object.
self.to_increment[0] += 1
When you lock a Lock object, the only thing that it prevents is, it prevents other threads from locking the same Lock at the same time.
I am testing a method to run several tasks in parallel. These tasks will run in parallel threads and I want the tasks to repeat until a global variable is set. I am first trying threading to launch the parallel threads, and make sure they will work properly. What I have so far:
import threading
from IPython.display import clear_output
import time
i = 0
j = 0
def main():
global i
global j
t1 = threading.Thread(name = "task1", target = task1)
t2 = threading.Thread(name = "task2", target = task2)
t1.start()
t2.start()
def task1():
global i
i += 1
time.sleep(10)
t1 = threading.Thread(name = "task1", target = task1)
t1.start()
def task2():
global j
j -= 1
time.sleep(10)
t2 = threading.Thread(name = "task2", target = task2)
t2.start()
tmain = threading.Thread(name = "main", target = main)
tmain.start()
which starts a main thread that then starts two threads which run task1 and task2. To monitor the current threads and the values of i and j I run:
while(True):
clear_output(wait=True)
for thread in threading.enumerate():
print(thread)
print(i)
print(j)
time.sleep(0.1)
(all of this is being run in a Jupyter Notebook).
Running the script above, i noticed some unexpected results. I expect that at any given time, there should be at most two threads of task1 and task2, but instead I observe many more threads of task2 compared to task1. These are not ghost or finished threads, because the absolute values of i and j grow disproportionately. Two observations I made:
Again, i expect that there should be a symmetric number of threads for both task1 and task 2, and I also expect that the abslute values of i and j should grow more proportionately than they are. Any insight on how to mitigate this discrepancy or avoid this issue would be appreciated.
I ran your code in Jupyter and didn't have your problem.
<_MainThread(MainThread, started 139735228168000)>
<Thread(Thread-1, started daemon 139735083251456)>
<Heartbeat(Thread-2, started daemon 139735074858752)>
<HistorySavingThread(IPythonHistorySavingThread, started 139735049680640)>
<Thread(task2, started 139734638634752)>
<Thread(task1, started 139734680598272)>
<Thread(task2, started 139735041287936)>
<Thread(task1, started 139734076618496)>
<Thread(task1, started 139735032895232)>
<Thread(task2, started 139734672205568)>
<Thread(task1, started 139734655420160)>
<Thread(task2, started 139734630242048)>
272
-272
But as you already saw with your own code, there are multiple instances of each task running. So after a task 'has started itself anew' it takes some time before it kills itself.
A solution to your Jupyter problem could be to give the main function the control of restarting a killed tasked. This ensures that always only 1 thread of each task is running.
import threading
from IPython.display import clear_output
import time
i = 0
j = 0
main_stop = False
def task1():
global i
i += 1
time.sleep(4)
def task2():
global j
j -= 1
time.sleep(4)
def main():
global i
global j
t1 = threading.Thread(name="task1", target=task1)
t2 = threading.Thread(name="task2", target=task2)
t1.start()
t2.start()
while not main_stop:
if not t1.is_alive():
del t1
t1 = threading.Thread(name="task1", target=task1)
t1.start()
if not t2.is_alive():
del t2
t2 = threading.Thread(name="task2", target=task2)
t2.start()
# wait for tasks to complete
while t1.is_alive():
time.sleep(0.1)
while t2.is_alive():
time.sleep(0.1)
tmain = threading.Thread(name="main", target=main)
tmain.start()
run_time = 30 # seconds
end_time = time.time() + run_time
while time.time() < end_time:
clear_output(wait=True)
for thread in threading.enumerate():
print(thread)
print(i)
print(j)
time.sleep(0.1)
main_stop = True
# wait for main to complete
while tmain.is_alive():
time.sleep(0.1)
print('program completed')
I need to start two threads, controlling which one starts first, then having them alternating their jobs.
The following code works as expected with do_sleep = True, but it can fail with do_sleep = False.
How can I achieve the same result without using those ugly (and unreliable) sleeps?
The reason why it works with do_sleep = True is that:
Each worker thread gives time to the other thread to start before trying to acquire the lock and start the next job
There is a pause between the start of the first and the second worker that allows the first one to acquire the lock before the second is ready
With do_sleep = False it can fail because:
At the end of each job, each thread can try to acquire the lock for the next cycle before the other thread, executing two consecutive jobs instead of alternating
The second thread could acquire the lock before the first one
Here is the code:
import threading
import time
import random
do_sleep = True
def workerA(lock):
for i in range(5):
lock.acquire()
print('Working A - %s' % i)
time.sleep(random.uniform(0.2, 1))
lock.release()
if do_sleep: time.sleep(0.1)
def workerB(lock):
for i in range(5):
if do_sleep: time.sleep(0.1)
lock.acquire()
print('Working B - %s' % i)
time.sleep(random.uniform(0.2, 1))
lock.release()
if do_sleep: time.sleep(0.1)
lock = threading.Lock()
t1 = threading.Thread(target=workerA, args=(lock, ))
t2 = threading.Thread(target=workerB, args=(lock, ))
t1.start()
if do_sleep: time.sleep(0.1)
t2.start()
t1.join()
t2.join()
print('done')
EDIT
Using a Queue as suggested by Mike doesn't help, because the first worker would finish the job without waiting for the second.
This is the wrong output of a version after replacing the Lock with a Queue:
Working A - 0
Working A - 1
Working B - 0
Working A - 2
Working B - 1
Working A - 3
Working B - 2
Working A - 4
Working B - 3
Working B - 4
done
This is the wrong output, obtained with do_sleep = False:
Working A - 0
Working A - 1
Working A - 2
Working A - 3
Working A - 4
Working B - 0
Working B - 1
Working B - 2
Working B - 3
Working B - 4
done
This is the correct output, obtained with do_sleep = True:
Working A - 0
Working B - 0
Working A - 1
Working B - 1
Working A - 2
Working B - 2
Working A - 3
Working B - 3
Working A - 4
Working B - 4
done
Several ways to solve this. One relatively easy one is to use the lock to control access to a separate shared variable: call this other variable owner, it can either be set to A or B. Thread A can only start a job when owner is set to A, and thread B can only start a job when owner is set to B. Then the pseudo-code is (assume thread A here):
while True:
while True:
# Loop until I'm the owner
lock.acquire()
if owner == A:
break
lock.release()
# Now I'm the owner. And I still hold the lock. Start job.
<Grab next job (or start job or finish job, whatever is required to remove it from contention)>
owner = B
lock.release()
<Finish job if not already done. Go get next one>
The B thread does the same thing only reversing the if owner and owner = statements. And obviously you can parameterize it so that both actually just run the same code.
EDIT
Here is the working version, with the suggested logic inside an object:
import threading
import time
def workerA(lock):
for i in range(5):
lock.acquire_for('A')
print('Start A - %s' % i)
time.sleep(0.5)
print('End A - %s' % i)
lock.release_to('B')
def workerB(lock):
for i in range(5):
lock.acquire_for('B')
print('Start B - %s' % i)
time.sleep(2)
print('End B - %s' % i)
lock.release_to('A')
class LockWithOwner:
lock = threading.RLock()
owner = 'A'
def acquire_for(self, owner):
n = 0
while True:
self.lock.acquire()
if self.owner == owner:
break
n += 1
self.lock.release()
time.sleep(0.001)
print('Waited for {} to be the owner {} times'.format(owner, n))
def release_to(self, new_owner):
self.owner = new_owner
self.lock.release()
lock = LockWithOwner()
lock.owner = 'A'
t1 = threading.Thread(target=workerA, args=(lock, ))
t2 = threading.Thread(target=workerB, args=(lock, ))
t1.start()
t2.start()
t1.join()
t2.join()
print('done')
You can exclude the possibility of the wrong thread acquiring the lock, exclude relying on time.sleep(...) for correctness and shorten your code at the same time using Queue (two queues for both way communication):
import threading
import time
import random
from Queue import Queue
def work_hard(name, i):
print('start %s - %s' % (name, i))
time.sleep(random.uniform(0.2, 1))
print('end %s - %s' % (name, i))
def worker(name, q_mine, q_his):
for i in range(5):
q_mine.get()
work_hard(name, i)
q_his.put(1)
qAB = Queue()
qBA = Queue()
t1 = threading.Thread(target=worker, args=('A', qAB, qBA))
t2 = threading.Thread(target=worker, args=('B', qBA, qAB))
t1.start()
qAB.put(1) # notice how you don't need time.sleep(...) even here
t2.start()
t1.join()
t2.join()
print('done')
It works as you specified. Alternatively you can use threading.Condition (a combination of acquire, release, wait and notify/notifyAll), but that will be more subtle, especially in terms of which thread goes first.
I have tried Gil Hamilton's answer and it doesn't work for me if I remove all the sleeps. I think it's because my 'main' thread keeps getting the priority. I found out that a better way to synchronize two or more threads is to use conditional object.
Here is my working alternate lock object with conditional object inside
class AltLock():
def __init__(self, initial_thread):
self.allow = initial_thread
self.cond = threading.Condition()
def acquire_for(self, thread):
self.cond.acquire()
while self.allow!=thread:
print("\tLOCK:", thread, "waiting")
self.cond.wait()
print("\tLOCK:", thread, "acquired")
def release_to(self, thread):
print("\tLOCK: releasing to", thread)
self.allow=thread
self.cond.notifyAll()
self.cond.release()
And this is an example usecase (the sleep statements in the thread are not required):
class MyClass():
def __init__(self):
self.lock = AltLock("main")
def _start(self):
print("thread: Started, wait 2 second")
time.sleep(2)
print("---")
self.lock.acquire_for("thread")
time.sleep(2)
print("---")
print("thread: start lock acquired")
self.lock.release_to("main")
return 0
def start(self):
self.lock.acquire_for("main")
self.thread = threading.Thread(target = self._start, )
self.thread.start()
print("main: releasing lock")
self.lock.release_to("thread")
self.lock.acquire_for("main")
print("main: lock acquired")
myclass = MyClass()
myclass.start()
myclass.lock.release_to("main") # house keeping
And this is stdout:
LOCK: main acquired
thread: Started, wait 2 second
main: releasing lock
LOCK: releasing to thread
LOCK: main waiting // 'main' thread try to reacquire the lock immediately but get blocked by wait.
---
LOCK: thread acquired
---
thread: start lock acquired
LOCK: releasing to main
LOCK: main acquired
main: lock acquired
LOCK: releasing to main
I've started programming in Python a few weeks ago and was trying to use Semaphores to synchronize two simple threads, for learning purposes. Here is what I've got:
import threading
sem = threading.Semaphore()
def fun1():
while True:
sem.acquire()
print(1)
sem.release()
def fun2():
while True:
sem.acquire()
print(2)
sem.release()
t = threading.Thread(target = fun1)
t.start()
t2 = threading.Thread(target = fun2)
t2.start()
But it keeps printing just 1's. How can I intercale the prints?
It is working fine, its just that its printing too fast for you to see . Try putting a time.sleep() in both functions (a small amount) to sleep the thread for that much amount of time, to actually be able to see both 1 as well as 2.
Example -
import threading
import time
sem = threading.Semaphore()
def fun1():
while True:
sem.acquire()
print(1)
sem.release()
time.sleep(0.25)
def fun2():
while True:
sem.acquire()
print(2)
sem.release()
time.sleep(0.25)
t = threading.Thread(target = fun1)
t.start()
t2 = threading.Thread(target = fun2)
t2.start()
Also, you can use Lock/mutex method as follows:
import threading
import time
mutex = threading.Lock() # is equal to threading.Semaphore(1)
def fun1():
while True:
mutex.acquire()
print(1)
mutex.release()
time.sleep(.5)
def fun2():
while True:
mutex.acquire()
print(2)
mutex.release()
time.sleep(.5)
t1 = threading.Thread(target=fun1).start()
t2 = threading.Thread(target=fun2).start()
Simpler style using "with":
import threading
import time
mutex = threading.Lock() # is equal to threading.Semaphore(1)
def fun1():
while True:
with mutex:
print(1)
time.sleep(.5)
def fun2():
while True:
with mutex:
print(2)
time.sleep(.5)
t1 = threading.Thread(target=fun1).start()
t2 = threading.Thread(target=fun2).start()
[NOTE]:
The difference between mutex, semaphore, and lock
In fact, I want to find asyncio.Semaphores, not threading.Semaphore,
and I believe someone may want it too.
So, I decided to share the asyncio.Semaphores, hope you don't mind.
from asyncio import (
Task,
Semaphore,
)
import asyncio
from typing import List
async def shopping(sem: Semaphore):
while True:
async with sem:
print(shopping.__name__)
await asyncio.sleep(0.25) # Transfer control to the loop, and it will assign another job (is idle) to run.
async def coding(sem: Semaphore):
while True:
async with sem:
print(coding.__name__)
await asyncio.sleep(0.25)
async def main():
sem = Semaphore(value=1)
list_task: List[Task] = [asyncio.create_task(_coroutine(sem)) for _coroutine in (shopping, coding)]
"""
# Normally, we will wait until all the task has done, but that is impossible in your case.
for task in list_task:
await task
"""
await asyncio.sleep(2) # So, I let the main loop wait for 2 seconds, then close the program.
asyncio.run(main())
output
shopping
coding
shopping
coding
shopping
coding
shopping
coding
shopping
coding
shopping
coding
shopping
coding
shopping
coding
16*0.25 = 2
I used this code to demonstrate how 1 thread can use a Semaphore and the other thread will wait (non-blocking) until the Sempahore is available.
This was written using Python3.6; Not tested on any other version.
This will only work is the synchronization is being done from the same thread, IPC from separate processes will fail using this mechanism.
import threading
from time import sleep
sem = threading.Semaphore()
def fun1():
print("fun1 starting")
sem.acquire()
for loop in range(1,5):
print("Fun1 Working {}".format(loop))
sleep(1)
sem.release()
print("fun1 finished")
def fun2():
print("fun2 starting")
while not sem.acquire(blocking=False):
print("Fun2 No Semaphore available")
sleep(1)
else:
print("Got Semphore")
for loop in range(1, 5):
print("Fun2 Working {}".format(loop))
sleep(1)
sem.release()
t1 = threading.Thread(target = fun1)
t2 = threading.Thread(target = fun2)
t1.start()
t2.start()
t1.join()
t2.join()
print("All Threads done Exiting")
When I run this - I get the following output.
fun1 starting
Fun1 Working 1
fun2 starting
Fun2 No Semaphore available
Fun1 Working 2
Fun2 No Semaphore available
Fun1 Working 3
Fun2 No Semaphore available
Fun1 Working 4
Fun2 No Semaphore available
fun1 finished
Got Semphore
Fun2 Working 1
Fun2 Working 2
Fun2 Working 3
Fun2 Working 4
All Threads done Exiting
Existing answers are wastefully sleeping
I noticed that almost all answers use some form of time.sleep or asyncio.sleep, which blocks the thread. This should be avoided in real software, because blocking your thread for 0.25, 0.5 or 1 second is unnecessary/wasteful - you could be doing more processing, especially if your application is IO bound - it already blocks when it does IO AND you are introducing arbitrary delays (latency) in your processing time. If all your threads are sleeping, your app isn't doing anything. Also, these variables are quite arbitrary, which is why each answer has a different value they sleep (block the thread for).
The answers are using it as a way to get Python's bytecode interpreter to pre-empt the thread after each print line, so that it alternates deterministically between running the 2 threads. By default, the interpreter pre-empts a thread every 5ms (sys.getswitchinterval() returns 0.005), and remember that these threads never run in parallel, because of Python's GIL
Solution to problem
How can I intercale the prints?
So my answer would be, you do not want to use semaphores to print (or process) something in a certain order reliably, because you cannot rely on thread prioritization in Python. See Controlling scheduling priority of python threads? for more. time.sleep(arbitrarilyLargeEnoughNumber) doesn't really work when you have more than 2 concurrent pieces of code, since you don't know which one will run next - see * below. If the order matters, use a queue, and worker threads:
from threading import Thread
import queue
q = queue.Queue()
def enqueue():
while True:
q.put(1)
q.put(2)
def reader():
while True:
value = q.get()
print(value)
enqueuer_thread = Thread(target = enqueue)
reader_thread_1 = Thread(target = reader)
reader_thread_2 = Thread(target = reader)
reader_thread_3 = Thread(target = reader)
enqueuer_thread.start()
reader_thread_1.start()
reader_thread_2.start()
reader_thread_3.start()
...
Unfortunately in this problem, you don't get to use Semaphore.
*An extra check for you
If you try a modification of the top voted answer but with an extra function/thread to print(3), you'll get:
1
2
3
1
3
2
1
3
...
Within a few prints, the ordering is broken - it's 1-3-2.
You need to use 2 semaphores to do what you want to do, and you need to initialize them at 0.
import threading
SEM_FUN1 = threading.Semaphore(0)
SEM_FUN2 = threading.Semaphore(0)
def fun1() -> None:
for _ in range(5):
SEM_FUN1.acquire()
print(1)
SEM_FUN2.release()
def fun2() -> None:
for _ in range(5):
SEM_FUN2.acquire()
print(2)
SEM_FUN1.release()
threading.Thread(target=fun1).start()
threading.Thread(target=fun2).start()
SEM_FUN1.release() # Trigger fun1
Output:
I have the following code:
def countdown():
def countdown1():
print 'countdown1'
def countdown2():
def countdown3():
takePic()
self.pic.set_markup("<span size='54000'>1</span>");
print 1
t3 = Timer(1.0, countdown3)
t3.start()
self.pic.set_markup("<span size='54000'>2</span>");
print 2
t2 = Timer(1.0, countdown2)
t2.start()
self.pic.set_markup("<span size='54000'>3</span>");
print 3
t1 = Timer(1.0, countdown1)
t1.start()
countdown()
It should show a countdown from 3. The number 3 appears, but afterwards nothing happens. help?
Your main thread is probably exiting before any timers fire. The simplest and crudest way to fix this is to get the main thread to sleep for as long as necessary. A saner option is to signal something like a semaphore at the end of countdown3 and wait on it in the main thread.
A more elegant solution, which can be integrated with a broader scheduling and asynchrony framework, is to invert the flow of control using generators:
def countdown():
self.pic.set_markup("<span size='54000'>3</span>");
print 3
yield 1.0
print 'countdown1'
self.pic.set_markup("<span size='54000'>2</span>");
print 2
yield 1.0
self.pic.set_markup("<span size='54000'>1</span>");
print 1
yield 1.0
takePic()
for t in countdown():
time.sleep(t)
Why not just .join() your timer threads after you .start() them, so that the rest of your code waits until the timers are done to continue?
Are you sure some other command isn't blocking? Like set_markup? A simplified example works for me:
>>> from threading import Timer
>>> def lvl1():
def lvl2():
print "evaling lvl2"
def lvl3():
print "evaling lvl3"
print "TakePic()"
print 1
t3 = Timer(1.0, lvl3)
t3.start()
print 2
t2 = Timer(2.0, lvl2)
t2.start()
>>> lvl1()
2
>>> evaling lvl2
1
evaling lvl3
TakePic()