I'd always assumed a threading.Lock object would act as a mutex to prevent race conditions from occuring in a multithreading Python script, but I'm finding that either
my assumption is false (contradicting years of experience), or
Python itself has a bug (in, at the very least, versions 2.7-3.9) regarding this.
Theoretically, incrementing a value shared between two threads should be fine as long as you protect the critical section (ie the code incrementing that value) with a Lock, ie. mutex.
Running this code, I find mutexes in Python not to work as expected. Can anyone enlighten me on this?
#!/usr/bin/env python
from __future__ import print_function
import sys
import threading
import time
Stop = False
class T(threading.Thread):
def __init__(self,list_with_int):
self.mycount = 0
self.to_increment = list_with_int
super(T,self).__init__()
def run(self,):
while not Stop:
with threading.Lock():
self.to_increment[0] += 1
self.mycount += 1
intList = [0]
t1 = T(intList)
t2 = T(intList)
t1.start()
t2.start()
Delay = float(sys.argv[1]) if sys.argv[1:] else 3.0
time.sleep(Delay)
Stop = True
t1.join()
t2.join()
total_internal_counts = t1.mycount + t2.mycount
print("Compare:\n\t{total_internal_counts}\n\t{intList[0]}\n".format(**locals()))
assert total_internal_counts == intList[0]
It's possible that this is the answer, the lock object must be persistent and shared amongst the threads.
This works :
#!/usr/bin/env python
from __future__ import print_function
import sys
import threading
import time
Stop = False
class T(threading.Thread):
lock = threading.Lock()
def __init__(self,list_with_int):
self.mycount = 0
self.to_increment = list_with_int
super(T,self).__init__()
def run(self,):
while not Stop:
with self.lock:
self.to_increment[0] += 1
self.mycount += 1
intList = [0]
t1 = T(intList)
t2 = T(intList)
t1.start()
t2.start()
Delay = float(sys.argv[1]) if sys.argv[1:] else 3.0
time.sleep(Delay)
Stop = True
t1.join()
t2.join()
total_internal_counts = t1.mycount + t2.mycount
print("Compare:\n\t{total_internal_counts}\n\t{intList[0]}\n".format(**locals()))
It's possible that...the lock object must be persistent and shared amongst the threads
That's exactly right. threading.Lock() is a constructor call. The run loop in your original example created a new Lock object for each iteration:
while not Stop:
with threading.Lock(): //creates a new Lock object each time 'round
self.to_increment[0] += 1
You fixed it by creating a single Lock object that is used by all the threads and, by every iteration of the loop in each thread.
class T(threading.Thread):
// create one Lock object that will be shared by all instances of
// class T
//
lock = threading.Lock()
def run(self,):
while not Stop:
with self.lock: // lock and release the one Lock object.
self.to_increment[0] += 1
When you lock a Lock object, the only thing that it prevents is, it prevents other threads from locking the same Lock at the same time.
Related
I'm learning about concurrent programming with Python.
In the following code, I seem to be having synchronizing issues. How can I fix it?
import threading
N = 1000000
counter = 0
def increment():
global counter
for i in range(N):
counter += 1
t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)
t1.start()
t2.start()
t1.join()
t2.join()
print(counter)
Both threads are trying to modify counter at the same time, and sometimes they do. That results in some of the increments not appearing. Here is a simple minded approach to solve that problem using threading.Lock:
import threading
N = 1000000
counter = 0
def increment(theLock):
global counter
for i in range(N):
theLock.acquire()
counter += 1
theLock.release()
lock = threading.Lock()
t1 = threading.Thread(target=increment, args=[lock,])
t2 = threading.Thread(target=increment, args=[lock,])
t1.start()
t2.start()
t1.join()
t2.join()
print(counter)
The theLock.acquire() and theLock.release() surround code that must be protected to only run in one thread at a time. In your example the acquire and release could also surround the entire loop, but that would be the same as not using multiprocessing. See the threading documentation and in particular, the Lock Objects section.
How would you go about combining threading.RLock with threading.Semaphore? Or does such a structure already exist?
In Python, there is a primitive for a Reentrant lock, threading.RLock(N), which allows the same thread to acquire a lock multiple times, but no other threads can. There is also threading.Semaphore(N), which allows the lock to be acquired N times before blocking. How would one combine these two structures? I want up to N separate threads to be able to acquire the lock, but I'd like each individual lock on a thread to be a reentrant one.
So I guess a Reentrant semaphore does not exist. Here is the implementation I came up with, happy to entertain comments.
import threading
import datetime
class ReentrantSemaphore(object):
'''A counting Semaphore which allows threads to reenter.'''
def __init__(self, value = 1):
self.local = threading.local()
self.sem = threading.Semaphore(value)
def acquire(self):
if not getattr(self.local, 'lock_level', 0):
# We do not yet have the lock, acquire it.
start = datetime.datetime.utcnow()
self.sem.acquire()
end = datetime.datetime.utcnow()
if end - start > datetime.timedelta(seconds = 3):
logging.info("Took %d Sec to lock."%((end - start).total_seconds()))
self.local.lock_time = end
self.local.lock_level = 1
else:
# We already have the lock, just increment it due to the recursive call.
self.local.lock_level += 1
def release(self):
if getattr(self.local, 'lock_level', 0) < 1:
raise Exception("Trying to release a released lock.")
self.local.lock_level -= 1
if self.local.lock_level == 0:
self.sem.release()
__enter__ = acquire
def __exit__(self, t, v, tb):
self.release()
I need to start two threads, controlling which one starts first, then having them alternating their jobs.
The following code works as expected with do_sleep = True, but it can fail with do_sleep = False.
How can I achieve the same result without using those ugly (and unreliable) sleeps?
The reason why it works with do_sleep = True is that:
Each worker thread gives time to the other thread to start before trying to acquire the lock and start the next job
There is a pause between the start of the first and the second worker that allows the first one to acquire the lock before the second is ready
With do_sleep = False it can fail because:
At the end of each job, each thread can try to acquire the lock for the next cycle before the other thread, executing two consecutive jobs instead of alternating
The second thread could acquire the lock before the first one
Here is the code:
import threading
import time
import random
do_sleep = True
def workerA(lock):
for i in range(5):
lock.acquire()
print('Working A - %s' % i)
time.sleep(random.uniform(0.2, 1))
lock.release()
if do_sleep: time.sleep(0.1)
def workerB(lock):
for i in range(5):
if do_sleep: time.sleep(0.1)
lock.acquire()
print('Working B - %s' % i)
time.sleep(random.uniform(0.2, 1))
lock.release()
if do_sleep: time.sleep(0.1)
lock = threading.Lock()
t1 = threading.Thread(target=workerA, args=(lock, ))
t2 = threading.Thread(target=workerB, args=(lock, ))
t1.start()
if do_sleep: time.sleep(0.1)
t2.start()
t1.join()
t2.join()
print('done')
EDIT
Using a Queue as suggested by Mike doesn't help, because the first worker would finish the job without waiting for the second.
This is the wrong output of a version after replacing the Lock with a Queue:
Working A - 0
Working A - 1
Working B - 0
Working A - 2
Working B - 1
Working A - 3
Working B - 2
Working A - 4
Working B - 3
Working B - 4
done
This is the wrong output, obtained with do_sleep = False:
Working A - 0
Working A - 1
Working A - 2
Working A - 3
Working A - 4
Working B - 0
Working B - 1
Working B - 2
Working B - 3
Working B - 4
done
This is the correct output, obtained with do_sleep = True:
Working A - 0
Working B - 0
Working A - 1
Working B - 1
Working A - 2
Working B - 2
Working A - 3
Working B - 3
Working A - 4
Working B - 4
done
Several ways to solve this. One relatively easy one is to use the lock to control access to a separate shared variable: call this other variable owner, it can either be set to A or B. Thread A can only start a job when owner is set to A, and thread B can only start a job when owner is set to B. Then the pseudo-code is (assume thread A here):
while True:
while True:
# Loop until I'm the owner
lock.acquire()
if owner == A:
break
lock.release()
# Now I'm the owner. And I still hold the lock. Start job.
<Grab next job (or start job or finish job, whatever is required to remove it from contention)>
owner = B
lock.release()
<Finish job if not already done. Go get next one>
The B thread does the same thing only reversing the if owner and owner = statements. And obviously you can parameterize it so that both actually just run the same code.
EDIT
Here is the working version, with the suggested logic inside an object:
import threading
import time
def workerA(lock):
for i in range(5):
lock.acquire_for('A')
print('Start A - %s' % i)
time.sleep(0.5)
print('End A - %s' % i)
lock.release_to('B')
def workerB(lock):
for i in range(5):
lock.acquire_for('B')
print('Start B - %s' % i)
time.sleep(2)
print('End B - %s' % i)
lock.release_to('A')
class LockWithOwner:
lock = threading.RLock()
owner = 'A'
def acquire_for(self, owner):
n = 0
while True:
self.lock.acquire()
if self.owner == owner:
break
n += 1
self.lock.release()
time.sleep(0.001)
print('Waited for {} to be the owner {} times'.format(owner, n))
def release_to(self, new_owner):
self.owner = new_owner
self.lock.release()
lock = LockWithOwner()
lock.owner = 'A'
t1 = threading.Thread(target=workerA, args=(lock, ))
t2 = threading.Thread(target=workerB, args=(lock, ))
t1.start()
t2.start()
t1.join()
t2.join()
print('done')
You can exclude the possibility of the wrong thread acquiring the lock, exclude relying on time.sleep(...) for correctness and shorten your code at the same time using Queue (two queues for both way communication):
import threading
import time
import random
from Queue import Queue
def work_hard(name, i):
print('start %s - %s' % (name, i))
time.sleep(random.uniform(0.2, 1))
print('end %s - %s' % (name, i))
def worker(name, q_mine, q_his):
for i in range(5):
q_mine.get()
work_hard(name, i)
q_his.put(1)
qAB = Queue()
qBA = Queue()
t1 = threading.Thread(target=worker, args=('A', qAB, qBA))
t2 = threading.Thread(target=worker, args=('B', qBA, qAB))
t1.start()
qAB.put(1) # notice how you don't need time.sleep(...) even here
t2.start()
t1.join()
t2.join()
print('done')
It works as you specified. Alternatively you can use threading.Condition (a combination of acquire, release, wait and notify/notifyAll), but that will be more subtle, especially in terms of which thread goes first.
I have tried Gil Hamilton's answer and it doesn't work for me if I remove all the sleeps. I think it's because my 'main' thread keeps getting the priority. I found out that a better way to synchronize two or more threads is to use conditional object.
Here is my working alternate lock object with conditional object inside
class AltLock():
def __init__(self, initial_thread):
self.allow = initial_thread
self.cond = threading.Condition()
def acquire_for(self, thread):
self.cond.acquire()
while self.allow!=thread:
print("\tLOCK:", thread, "waiting")
self.cond.wait()
print("\tLOCK:", thread, "acquired")
def release_to(self, thread):
print("\tLOCK: releasing to", thread)
self.allow=thread
self.cond.notifyAll()
self.cond.release()
And this is an example usecase (the sleep statements in the thread are not required):
class MyClass():
def __init__(self):
self.lock = AltLock("main")
def _start(self):
print("thread: Started, wait 2 second")
time.sleep(2)
print("---")
self.lock.acquire_for("thread")
time.sleep(2)
print("---")
print("thread: start lock acquired")
self.lock.release_to("main")
return 0
def start(self):
self.lock.acquire_for("main")
self.thread = threading.Thread(target = self._start, )
self.thread.start()
print("main: releasing lock")
self.lock.release_to("thread")
self.lock.acquire_for("main")
print("main: lock acquired")
myclass = MyClass()
myclass.start()
myclass.lock.release_to("main") # house keeping
And this is stdout:
LOCK: main acquired
thread: Started, wait 2 second
main: releasing lock
LOCK: releasing to thread
LOCK: main waiting // 'main' thread try to reacquire the lock immediately but get blocked by wait.
---
LOCK: thread acquired
---
thread: start lock acquired
LOCK: releasing to main
LOCK: main acquired
main: lock acquired
LOCK: releasing to main
I'm having trouble understanding threads in Python. I have this program:
import _thread, time
def print_loop():
num = 0
while 1:
num = num + 1
print(num)
time.sleep(1)
_thread.start_new_thread(print_loop, ())
time.sleep(10)
And my question is if I need to close the thread print_loop, because it looks to me that both threads end when the main thread ends. Is this proper way to handle threads?
First, avoid using the low-level API unless you absolutely have to. The threading module is preferred over _thread. In general in Python, avoid anything starting with an underscore.
Now, the method you are looking for is called join. I.e.
import time
from threading import Thread
stop = False
def print_loop():
num = 0
while not stop:
num = num + 1
print(num)
time.sleep(1)
thread = Thread(target=print_loop)
thread.start()
time.sleep(10)
stop = True
thread.join()
I'm trying to understand the basics of threading and concurrency. I want a simple case where two threads repeatedly try to access one shared resource.
The code:
import threading
class Thread(threading.Thread):
def __init__(self, t, *args):
threading.Thread.__init__(self, target=t, args=args)
self.start()
count = 0
lock = threading.Lock()
def increment():
global count
lock.acquire()
try:
count += 1
finally:
lock.release()
def bye():
while True:
increment()
def hello_there():
while True:
increment()
def main():
hello = Thread(hello_there)
goodbye = Thread(bye)
while True:
print count
if __name__ == '__main__':
main()
So, I have two threads, both trying to increment the counter. I thought that if thread 'A' called increment(), the lock would be established, preventing 'B' from accessing until 'A' has released.
Running the makes it clear that this is not the case. You get all of the random data race-ish increments.
How exactly is the lock object used?
Additionally, I've tried putting the locks inside of the thread functions, but still no luck.
You can see that your locks are pretty much working as you are using them, if you slow down the process and make them block a bit more. You had the right idea, where you surround critical pieces of code with the lock. Here is a small adjustment to your example to show you how each waits on the other to release the lock.
import threading
import time
import inspect
class Thread(threading.Thread):
def __init__(self, t, *args):
threading.Thread.__init__(self, target=t, args=args)
self.start()
count = 0
lock = threading.Lock()
def incre():
global count
caller = inspect.getouterframes(inspect.currentframe())[1][3]
print "Inside %s()" % caller
print "Acquiring lock"
with lock:
print "Lock Acquired"
count += 1
time.sleep(2)
def bye():
while count < 5:
incre()
def hello_there():
while count < 5:
incre()
def main():
hello = Thread(hello_there)
goodbye = Thread(bye)
if __name__ == '__main__':
main()
Sample output:
...
Inside hello_there()
Acquiring lock
Lock Acquired
Inside bye()
Acquiring lock
Lock Acquired
...
import threading
# global variable x
x = 0
def increment():
"""
function to increment global variable x
"""
global x
x += 1
def thread_task():
"""
task for thread
calls increment function 100000 times.
"""
for _ in range(100000):
increment()
def main_task():
global x
# setting global variable x as 0
x = 0
# creating threads
t1 = threading.Thread(target=thread_task)
t2 = threading.Thread(target=thread_task)
# start threads
t1.start()
t2.start()
# wait until threads finish their job
t1.join()
t2.join()
if __name__ == "__main__":
for i in range(10):
main_task()
print("Iteration {0}: x = {1}".format(i,x))