I need to start two threads, controlling which one starts first, then having them alternating their jobs.
The following code works as expected with do_sleep = True, but it can fail with do_sleep = False.
How can I achieve the same result without using those ugly (and unreliable) sleeps?
The reason why it works with do_sleep = True is that:
Each worker thread gives time to the other thread to start before trying to acquire the lock and start the next job
There is a pause between the start of the first and the second worker that allows the first one to acquire the lock before the second is ready
With do_sleep = False it can fail because:
At the end of each job, each thread can try to acquire the lock for the next cycle before the other thread, executing two consecutive jobs instead of alternating
The second thread could acquire the lock before the first one
Here is the code:
import threading
import time
import random
do_sleep = True
def workerA(lock):
for i in range(5):
lock.acquire()
print('Working A - %s' % i)
time.sleep(random.uniform(0.2, 1))
lock.release()
if do_sleep: time.sleep(0.1)
def workerB(lock):
for i in range(5):
if do_sleep: time.sleep(0.1)
lock.acquire()
print('Working B - %s' % i)
time.sleep(random.uniform(0.2, 1))
lock.release()
if do_sleep: time.sleep(0.1)
lock = threading.Lock()
t1 = threading.Thread(target=workerA, args=(lock, ))
t2 = threading.Thread(target=workerB, args=(lock, ))
t1.start()
if do_sleep: time.sleep(0.1)
t2.start()
t1.join()
t2.join()
print('done')
EDIT
Using a Queue as suggested by Mike doesn't help, because the first worker would finish the job without waiting for the second.
This is the wrong output of a version after replacing the Lock with a Queue:
Working A - 0
Working A - 1
Working B - 0
Working A - 2
Working B - 1
Working A - 3
Working B - 2
Working A - 4
Working B - 3
Working B - 4
done
This is the wrong output, obtained with do_sleep = False:
Working A - 0
Working A - 1
Working A - 2
Working A - 3
Working A - 4
Working B - 0
Working B - 1
Working B - 2
Working B - 3
Working B - 4
done
This is the correct output, obtained with do_sleep = True:
Working A - 0
Working B - 0
Working A - 1
Working B - 1
Working A - 2
Working B - 2
Working A - 3
Working B - 3
Working A - 4
Working B - 4
done
Several ways to solve this. One relatively easy one is to use the lock to control access to a separate shared variable: call this other variable owner, it can either be set to A or B. Thread A can only start a job when owner is set to A, and thread B can only start a job when owner is set to B. Then the pseudo-code is (assume thread A here):
while True:
while True:
# Loop until I'm the owner
lock.acquire()
if owner == A:
break
lock.release()
# Now I'm the owner. And I still hold the lock. Start job.
<Grab next job (or start job or finish job, whatever is required to remove it from contention)>
owner = B
lock.release()
<Finish job if not already done. Go get next one>
The B thread does the same thing only reversing the if owner and owner = statements. And obviously you can parameterize it so that both actually just run the same code.
EDIT
Here is the working version, with the suggested logic inside an object:
import threading
import time
def workerA(lock):
for i in range(5):
lock.acquire_for('A')
print('Start A - %s' % i)
time.sleep(0.5)
print('End A - %s' % i)
lock.release_to('B')
def workerB(lock):
for i in range(5):
lock.acquire_for('B')
print('Start B - %s' % i)
time.sleep(2)
print('End B - %s' % i)
lock.release_to('A')
class LockWithOwner:
lock = threading.RLock()
owner = 'A'
def acquire_for(self, owner):
n = 0
while True:
self.lock.acquire()
if self.owner == owner:
break
n += 1
self.lock.release()
time.sleep(0.001)
print('Waited for {} to be the owner {} times'.format(owner, n))
def release_to(self, new_owner):
self.owner = new_owner
self.lock.release()
lock = LockWithOwner()
lock.owner = 'A'
t1 = threading.Thread(target=workerA, args=(lock, ))
t2 = threading.Thread(target=workerB, args=(lock, ))
t1.start()
t2.start()
t1.join()
t2.join()
print('done')
You can exclude the possibility of the wrong thread acquiring the lock, exclude relying on time.sleep(...) for correctness and shorten your code at the same time using Queue (two queues for both way communication):
import threading
import time
import random
from Queue import Queue
def work_hard(name, i):
print('start %s - %s' % (name, i))
time.sleep(random.uniform(0.2, 1))
print('end %s - %s' % (name, i))
def worker(name, q_mine, q_his):
for i in range(5):
q_mine.get()
work_hard(name, i)
q_his.put(1)
qAB = Queue()
qBA = Queue()
t1 = threading.Thread(target=worker, args=('A', qAB, qBA))
t2 = threading.Thread(target=worker, args=('B', qBA, qAB))
t1.start()
qAB.put(1) # notice how you don't need time.sleep(...) even here
t2.start()
t1.join()
t2.join()
print('done')
It works as you specified. Alternatively you can use threading.Condition (a combination of acquire, release, wait and notify/notifyAll), but that will be more subtle, especially in terms of which thread goes first.
I have tried Gil Hamilton's answer and it doesn't work for me if I remove all the sleeps. I think it's because my 'main' thread keeps getting the priority. I found out that a better way to synchronize two or more threads is to use conditional object.
Here is my working alternate lock object with conditional object inside
class AltLock():
def __init__(self, initial_thread):
self.allow = initial_thread
self.cond = threading.Condition()
def acquire_for(self, thread):
self.cond.acquire()
while self.allow!=thread:
print("\tLOCK:", thread, "waiting")
self.cond.wait()
print("\tLOCK:", thread, "acquired")
def release_to(self, thread):
print("\tLOCK: releasing to", thread)
self.allow=thread
self.cond.notifyAll()
self.cond.release()
And this is an example usecase (the sleep statements in the thread are not required):
class MyClass():
def __init__(self):
self.lock = AltLock("main")
def _start(self):
print("thread: Started, wait 2 second")
time.sleep(2)
print("---")
self.lock.acquire_for("thread")
time.sleep(2)
print("---")
print("thread: start lock acquired")
self.lock.release_to("main")
return 0
def start(self):
self.lock.acquire_for("main")
self.thread = threading.Thread(target = self._start, )
self.thread.start()
print("main: releasing lock")
self.lock.release_to("thread")
self.lock.acquire_for("main")
print("main: lock acquired")
myclass = MyClass()
myclass.start()
myclass.lock.release_to("main") # house keeping
And this is stdout:
LOCK: main acquired
thread: Started, wait 2 second
main: releasing lock
LOCK: releasing to thread
LOCK: main waiting // 'main' thread try to reacquire the lock immediately but get blocked by wait.
---
LOCK: thread acquired
---
thread: start lock acquired
LOCK: releasing to main
LOCK: main acquired
main: lock acquired
LOCK: releasing to main
Related
I'd always assumed a threading.Lock object would act as a mutex to prevent race conditions from occuring in a multithreading Python script, but I'm finding that either
my assumption is false (contradicting years of experience), or
Python itself has a bug (in, at the very least, versions 2.7-3.9) regarding this.
Theoretically, incrementing a value shared between two threads should be fine as long as you protect the critical section (ie the code incrementing that value) with a Lock, ie. mutex.
Running this code, I find mutexes in Python not to work as expected. Can anyone enlighten me on this?
#!/usr/bin/env python
from __future__ import print_function
import sys
import threading
import time
Stop = False
class T(threading.Thread):
def __init__(self,list_with_int):
self.mycount = 0
self.to_increment = list_with_int
super(T,self).__init__()
def run(self,):
while not Stop:
with threading.Lock():
self.to_increment[0] += 1
self.mycount += 1
intList = [0]
t1 = T(intList)
t2 = T(intList)
t1.start()
t2.start()
Delay = float(sys.argv[1]) if sys.argv[1:] else 3.0
time.sleep(Delay)
Stop = True
t1.join()
t2.join()
total_internal_counts = t1.mycount + t2.mycount
print("Compare:\n\t{total_internal_counts}\n\t{intList[0]}\n".format(**locals()))
assert total_internal_counts == intList[0]
It's possible that this is the answer, the lock object must be persistent and shared amongst the threads.
This works :
#!/usr/bin/env python
from __future__ import print_function
import sys
import threading
import time
Stop = False
class T(threading.Thread):
lock = threading.Lock()
def __init__(self,list_with_int):
self.mycount = 0
self.to_increment = list_with_int
super(T,self).__init__()
def run(self,):
while not Stop:
with self.lock:
self.to_increment[0] += 1
self.mycount += 1
intList = [0]
t1 = T(intList)
t2 = T(intList)
t1.start()
t2.start()
Delay = float(sys.argv[1]) if sys.argv[1:] else 3.0
time.sleep(Delay)
Stop = True
t1.join()
t2.join()
total_internal_counts = t1.mycount + t2.mycount
print("Compare:\n\t{total_internal_counts}\n\t{intList[0]}\n".format(**locals()))
It's possible that...the lock object must be persistent and shared amongst the threads
That's exactly right. threading.Lock() is a constructor call. The run loop in your original example created a new Lock object for each iteration:
while not Stop:
with threading.Lock(): //creates a new Lock object each time 'round
self.to_increment[0] += 1
You fixed it by creating a single Lock object that is used by all the threads and, by every iteration of the loop in each thread.
class T(threading.Thread):
// create one Lock object that will be shared by all instances of
// class T
//
lock = threading.Lock()
def run(self,):
while not Stop:
with self.lock: // lock and release the one Lock object.
self.to_increment[0] += 1
When you lock a Lock object, the only thing that it prevents is, it prevents other threads from locking the same Lock at the same time.
I have two tasks of which one is called every two seconds and the other one is called at random times. Both need to access an object that can't be called before the previous call is finished (if that happens I need to reboot the hardware device manually).
The object is from a class which allows the communication with a hardware device via sockets.
To do so I created a thread class, in order to run everything in the background and no other tasks are blocked. Within this class I implemented a queue: Two different functions put Tasks into the queue and a worker is supposed to execute the tasks !!NOT!! simultaneously.
As this entire project is a server it should run continuously.
Well here is my code and it obviously is not working. I would be very happy if anyone has a clue on how to solve this.
Update: 26.10.2020
In order to make my issue more clear I updated the code based on the answer from Artiom Kozyrev.
import time
from threading import Lock, Thread
import threading
from queue import Queue
class ThreadWorker(Thread):
def __init__(self, _lock: Lock, _queue: Queue, name: str):
# daemon=False means that process waits until all threads are finished
# (not only main one and garbage collector)
super().__init__(name=name, daemon=False)
# lock prevents several worker threads do work simultaneously
self.lock = _lock
# tasks are send from the main thread via Queue
self.queue = _queue
def do_work(self, job):
# lock context manager prevents other worker threads from working in the same time
with self.lock:
time.sleep(3)
print(f"{threading.current_thread().getName()}: {job * 10}")
def run(self):
while True:
job = self.queue.get()
# "poison pillow" - stop message from queue
if not job:
break
self.do_work(job)
def TimeStamp(msg):
tElapsed = (time.time() - tStart) # Display Thread Info
sElap = int(tElapsed)
msElap = int((tElapsed - sElap) * 1000)
usElap = int((tElapsed - sElap - msElap / 1000) * 1000000)
print(msg , ': ', sElap, 's', msElap, 'ms', usElap, 'us')
def f1():
TimeStamp("f1 start")
time.sleep(2)
TimeStamp("f1 finished")
def f2():
TimeStamp("f2 start")
time.sleep(6)
TimeStamp("f2 finished")
def insertf1():
for i in range(10):
q.put(f1())
time.sleep(2)
def insertf2():
for i in range(10):
time.sleep(10)
q.put(f2())
q = Queue()
lock = Lock()
workers = [ThreadWorker(lock, q, f"Th-worker-{i}") for i in range(5)] # create workers
for w in workers:
w.start()
tStart = time.time()
threading.Thread(target=insertf1, daemon=True).start()
threading.Thread(target=insertf2, daemon=True).start()
The output is:
f1 start : 0 s 0 ms 0 us
f1 finished : 2 s 2 ms 515 us
f1 start : 4 s 9 ms 335 us
f1 finished : 6 s 9 ms 932 us
f1 start : 8 s 17 ms 428 us
f2 start : 10 s 12 ms 794 us
f1 finished : 10 s 28 ms 633 us
f1 start : 12 s 29 ms 182 us
f1 finished : 14 s 34 ms 411 us
f2 finished : 16 s 19 ms 330 us
f1 started before f2 was finished, which is what needs to be avoided.
To do so you need to combine Queue and Lock. Lock will prevent worker-threads from working in the same time. Find code example below:
import time
from threading import Lock, Thread
import threading
from queue import Queue
class ThreadWorker(Thread):
def __init__(self, _lock: Lock, _queue: Queue, name: str):
# daemon=False means that process waits until all threads are finished
# (not only main one and garbage collector)
super().__init__(name=name, daemon=False)
# lock prevents several worker threads do work simultaneously
self.lock = _lock
# tasks are send from the main thread via Queue
self.queue = _queue
def do_work(self, job):
# lock context manager prevents other worker threads from working in the same time
with self.lock:
time.sleep(3)
print(f"{threading.current_thread().getName()}: {job * 10}")
def run(self):
while True:
job = self.queue.get()
# "poison pillow" - stop message from queue
if not job:
break
self.do_work(job)
if __name__ == '__main__':
q = Queue()
lock = Lock()
workers = [ThreadWorker(lock, q, f"Th-worker-{i}") for i in range(5)] # create workers
for w in workers:
w.start()
# produce tasks
for i in range(10):
q.put(i)
# stop tasks with "poison pillow"
for i in range(len(workers)):
q.put(None)
Edit based on additions to the question (Lock added)
The main idea is that you should not run f1 and f2 without Lock.
import time
from threading import Lock, Thread
import threading
from queue import Queue
class ThreadWorker(Thread):
def __init__(self, _lock: Lock, _queue: Queue, name: str):
# daemon=False means that process waits until all threads are finished
# (not only main one and garbage collector)
super().__init__(name=name, daemon=False)
# lock prevents several worker threads do work simultaneously
self.lock = _lock
# tasks are send from the main thread via Queue
self.queue = _queue
def do_work(self, f):
# lock context manager prevents other worker threads from working in the same time
with self.lock:
time.sleep(3)
print(f"{threading.current_thread().getName()}: {f()}")
def run(self):
while True:
job = self.queue.get()
# "poison pillow" - stop message from queue
if not job:
break
self.do_work(job)
def TimeStamp(msg):
tElapsed = (time.time() - tStart) # Display Thread Info
sElap = int(tElapsed)
msElap = int((tElapsed - sElap) * 1000)
usElap = int((tElapsed - sElap - msElap / 1000) * 1000000)
print(msg, ': ', sElap, 's', msElap, 'ms', usElap, 'us')
def f1():
TimeStamp("f1 start")
time.sleep(1)
TimeStamp("f1 finished")
return f"Func-1-{threading.current_thread().getName()}"
def f2():
TimeStamp("f2 start")
time.sleep(3)
TimeStamp("f2 finished")
return f"Func-2-{threading.current_thread().getName()}"
def insertf1():
for i in range(5):
q.put(f1) # do not run f1 here! Run it in worker thread with Lock
def insertf2():
for i in range(5):
q.put(f2) # do not run f2 here! Run it in worker thread with Lock
q = Queue()
lock = Lock()
workers = [ThreadWorker(lock, q, f"Th-worker-{i}") for i in range(5)] # create workers
for w in workers:
w.start()
tStart = time.time()
threading.Thread(target=insertf1, daemon=True).start()
threading.Thread(target=insertf2, daemon=True).start()
I am testing a method to run several tasks in parallel. These tasks will run in parallel threads and I want the tasks to repeat until a global variable is set. I am first trying threading to launch the parallel threads, and make sure they will work properly. What I have so far:
import threading
from IPython.display import clear_output
import time
i = 0
j = 0
def main():
global i
global j
t1 = threading.Thread(name = "task1", target = task1)
t2 = threading.Thread(name = "task2", target = task2)
t1.start()
t2.start()
def task1():
global i
i += 1
time.sleep(10)
t1 = threading.Thread(name = "task1", target = task1)
t1.start()
def task2():
global j
j -= 1
time.sleep(10)
t2 = threading.Thread(name = "task2", target = task2)
t2.start()
tmain = threading.Thread(name = "main", target = main)
tmain.start()
which starts a main thread that then starts two threads which run task1 and task2. To monitor the current threads and the values of i and j I run:
while(True):
clear_output(wait=True)
for thread in threading.enumerate():
print(thread)
print(i)
print(j)
time.sleep(0.1)
(all of this is being run in a Jupyter Notebook).
Running the script above, i noticed some unexpected results. I expect that at any given time, there should be at most two threads of task1 and task2, but instead I observe many more threads of task2 compared to task1. These are not ghost or finished threads, because the absolute values of i and j grow disproportionately. Two observations I made:
Again, i expect that there should be a symmetric number of threads for both task1 and task 2, and I also expect that the abslute values of i and j should grow more proportionately than they are. Any insight on how to mitigate this discrepancy or avoid this issue would be appreciated.
I ran your code in Jupyter and didn't have your problem.
<_MainThread(MainThread, started 139735228168000)>
<Thread(Thread-1, started daemon 139735083251456)>
<Heartbeat(Thread-2, started daemon 139735074858752)>
<HistorySavingThread(IPythonHistorySavingThread, started 139735049680640)>
<Thread(task2, started 139734638634752)>
<Thread(task1, started 139734680598272)>
<Thread(task2, started 139735041287936)>
<Thread(task1, started 139734076618496)>
<Thread(task1, started 139735032895232)>
<Thread(task2, started 139734672205568)>
<Thread(task1, started 139734655420160)>
<Thread(task2, started 139734630242048)>
272
-272
But as you already saw with your own code, there are multiple instances of each task running. So after a task 'has started itself anew' it takes some time before it kills itself.
A solution to your Jupyter problem could be to give the main function the control of restarting a killed tasked. This ensures that always only 1 thread of each task is running.
import threading
from IPython.display import clear_output
import time
i = 0
j = 0
main_stop = False
def task1():
global i
i += 1
time.sleep(4)
def task2():
global j
j -= 1
time.sleep(4)
def main():
global i
global j
t1 = threading.Thread(name="task1", target=task1)
t2 = threading.Thread(name="task2", target=task2)
t1.start()
t2.start()
while not main_stop:
if not t1.is_alive():
del t1
t1 = threading.Thread(name="task1", target=task1)
t1.start()
if not t2.is_alive():
del t2
t2 = threading.Thread(name="task2", target=task2)
t2.start()
# wait for tasks to complete
while t1.is_alive():
time.sleep(0.1)
while t2.is_alive():
time.sleep(0.1)
tmain = threading.Thread(name="main", target=main)
tmain.start()
run_time = 30 # seconds
end_time = time.time() + run_time
while time.time() < end_time:
clear_output(wait=True)
for thread in threading.enumerate():
print(thread)
print(i)
print(j)
time.sleep(0.1)
main_stop = True
# wait for main to complete
while tmain.is_alive():
time.sleep(0.1)
print('program completed')
I've started programming in Python a few weeks ago and was trying to use Semaphores to synchronize two simple threads, for learning purposes. Here is what I've got:
import threading
sem = threading.Semaphore()
def fun1():
while True:
sem.acquire()
print(1)
sem.release()
def fun2():
while True:
sem.acquire()
print(2)
sem.release()
t = threading.Thread(target = fun1)
t.start()
t2 = threading.Thread(target = fun2)
t2.start()
But it keeps printing just 1's. How can I intercale the prints?
It is working fine, its just that its printing too fast for you to see . Try putting a time.sleep() in both functions (a small amount) to sleep the thread for that much amount of time, to actually be able to see both 1 as well as 2.
Example -
import threading
import time
sem = threading.Semaphore()
def fun1():
while True:
sem.acquire()
print(1)
sem.release()
time.sleep(0.25)
def fun2():
while True:
sem.acquire()
print(2)
sem.release()
time.sleep(0.25)
t = threading.Thread(target = fun1)
t.start()
t2 = threading.Thread(target = fun2)
t2.start()
Also, you can use Lock/mutex method as follows:
import threading
import time
mutex = threading.Lock() # is equal to threading.Semaphore(1)
def fun1():
while True:
mutex.acquire()
print(1)
mutex.release()
time.sleep(.5)
def fun2():
while True:
mutex.acquire()
print(2)
mutex.release()
time.sleep(.5)
t1 = threading.Thread(target=fun1).start()
t2 = threading.Thread(target=fun2).start()
Simpler style using "with":
import threading
import time
mutex = threading.Lock() # is equal to threading.Semaphore(1)
def fun1():
while True:
with mutex:
print(1)
time.sleep(.5)
def fun2():
while True:
with mutex:
print(2)
time.sleep(.5)
t1 = threading.Thread(target=fun1).start()
t2 = threading.Thread(target=fun2).start()
[NOTE]:
The difference between mutex, semaphore, and lock
In fact, I want to find asyncio.Semaphores, not threading.Semaphore,
and I believe someone may want it too.
So, I decided to share the asyncio.Semaphores, hope you don't mind.
from asyncio import (
Task,
Semaphore,
)
import asyncio
from typing import List
async def shopping(sem: Semaphore):
while True:
async with sem:
print(shopping.__name__)
await asyncio.sleep(0.25) # Transfer control to the loop, and it will assign another job (is idle) to run.
async def coding(sem: Semaphore):
while True:
async with sem:
print(coding.__name__)
await asyncio.sleep(0.25)
async def main():
sem = Semaphore(value=1)
list_task: List[Task] = [asyncio.create_task(_coroutine(sem)) for _coroutine in (shopping, coding)]
"""
# Normally, we will wait until all the task has done, but that is impossible in your case.
for task in list_task:
await task
"""
await asyncio.sleep(2) # So, I let the main loop wait for 2 seconds, then close the program.
asyncio.run(main())
output
shopping
coding
shopping
coding
shopping
coding
shopping
coding
shopping
coding
shopping
coding
shopping
coding
shopping
coding
16*0.25 = 2
I used this code to demonstrate how 1 thread can use a Semaphore and the other thread will wait (non-blocking) until the Sempahore is available.
This was written using Python3.6; Not tested on any other version.
This will only work is the synchronization is being done from the same thread, IPC from separate processes will fail using this mechanism.
import threading
from time import sleep
sem = threading.Semaphore()
def fun1():
print("fun1 starting")
sem.acquire()
for loop in range(1,5):
print("Fun1 Working {}".format(loop))
sleep(1)
sem.release()
print("fun1 finished")
def fun2():
print("fun2 starting")
while not sem.acquire(blocking=False):
print("Fun2 No Semaphore available")
sleep(1)
else:
print("Got Semphore")
for loop in range(1, 5):
print("Fun2 Working {}".format(loop))
sleep(1)
sem.release()
t1 = threading.Thread(target = fun1)
t2 = threading.Thread(target = fun2)
t1.start()
t2.start()
t1.join()
t2.join()
print("All Threads done Exiting")
When I run this - I get the following output.
fun1 starting
Fun1 Working 1
fun2 starting
Fun2 No Semaphore available
Fun1 Working 2
Fun2 No Semaphore available
Fun1 Working 3
Fun2 No Semaphore available
Fun1 Working 4
Fun2 No Semaphore available
fun1 finished
Got Semphore
Fun2 Working 1
Fun2 Working 2
Fun2 Working 3
Fun2 Working 4
All Threads done Exiting
Existing answers are wastefully sleeping
I noticed that almost all answers use some form of time.sleep or asyncio.sleep, which blocks the thread. This should be avoided in real software, because blocking your thread for 0.25, 0.5 or 1 second is unnecessary/wasteful - you could be doing more processing, especially if your application is IO bound - it already blocks when it does IO AND you are introducing arbitrary delays (latency) in your processing time. If all your threads are sleeping, your app isn't doing anything. Also, these variables are quite arbitrary, which is why each answer has a different value they sleep (block the thread for).
The answers are using it as a way to get Python's bytecode interpreter to pre-empt the thread after each print line, so that it alternates deterministically between running the 2 threads. By default, the interpreter pre-empts a thread every 5ms (sys.getswitchinterval() returns 0.005), and remember that these threads never run in parallel, because of Python's GIL
Solution to problem
How can I intercale the prints?
So my answer would be, you do not want to use semaphores to print (or process) something in a certain order reliably, because you cannot rely on thread prioritization in Python. See Controlling scheduling priority of python threads? for more. time.sleep(arbitrarilyLargeEnoughNumber) doesn't really work when you have more than 2 concurrent pieces of code, since you don't know which one will run next - see * below. If the order matters, use a queue, and worker threads:
from threading import Thread
import queue
q = queue.Queue()
def enqueue():
while True:
q.put(1)
q.put(2)
def reader():
while True:
value = q.get()
print(value)
enqueuer_thread = Thread(target = enqueue)
reader_thread_1 = Thread(target = reader)
reader_thread_2 = Thread(target = reader)
reader_thread_3 = Thread(target = reader)
enqueuer_thread.start()
reader_thread_1.start()
reader_thread_2.start()
reader_thread_3.start()
...
Unfortunately in this problem, you don't get to use Semaphore.
*An extra check for you
If you try a modification of the top voted answer but with an extra function/thread to print(3), you'll get:
1
2
3
1
3
2
1
3
...
Within a few prints, the ordering is broken - it's 1-3-2.
You need to use 2 semaphores to do what you want to do, and you need to initialize them at 0.
import threading
SEM_FUN1 = threading.Semaphore(0)
SEM_FUN2 = threading.Semaphore(0)
def fun1() -> None:
for _ in range(5):
SEM_FUN1.acquire()
print(1)
SEM_FUN2.release()
def fun2() -> None:
for _ in range(5):
SEM_FUN2.acquire()
print(2)
SEM_FUN1.release()
threading.Thread(target=fun1).start()
threading.Thread(target=fun2).start()
SEM_FUN1.release() # Trigger fun1
Output:
This may have been asked in a similar context but I was unable to find an answer after about 20 minutes of searching, so I will ask.
I have written a Python script (lets say: scriptA.py) and a script (lets say scriptB.py)
In scriptB I want to call scriptA multiple times with different arguments, each time takes about an hour to run, (its a huge script, does lots of stuff.. don't worry about it) and I want to be able to run the scriptA with all the different arguments simultaneously, but I need to wait till ALL of them are done before continuing; my code:
import subprocess
#setup
do_setup()
#run scriptA
subprocess.call(scriptA + argumentsA)
subprocess.call(scriptA + argumentsB)
subprocess.call(scriptA + argumentsC)
#finish
do_finish()
I want to do run all the subprocess.call() at the same time, and then wait till they are all done, how should I do this?
I tried to use threading like the example here:
from threading import Thread
import subprocess
def call_script(args)
subprocess.call(args)
#run scriptA
t1 = Thread(target=call_script, args=(scriptA + argumentsA))
t2 = Thread(target=call_script, args=(scriptA + argumentsB))
t3 = Thread(target=call_script, args=(scriptA + argumentsC))
t1.start()
t2.start()
t3.start()
But I do not think this is right.
How do I know they have all finished running before going to my do_finish()?
Put the threads in a list and then use the Join method
threads = []
t = Thread(...)
threads.append(t)
...repeat as often as necessary...
# Start all threads
for x in threads:
x.start()
# Wait for all of them to finish
for x in threads:
x.join()
You need to use join method of Thread object in the end of the script.
t1 = Thread(target=call_script, args=(scriptA + argumentsA))
t2 = Thread(target=call_script, args=(scriptA + argumentsB))
t3 = Thread(target=call_script, args=(scriptA + argumentsC))
t1.start()
t2.start()
t3.start()
t1.join()
t2.join()
t3.join()
Thus the main thread will wait till t1, t2 and t3 finish execution.
In Python3, since Python 3.2 there is a new approach to reach the same result, that I personally prefer to the traditional thread creation/start/join, package concurrent.futures: https://docs.python.org/3/library/concurrent.futures.html
Using a ThreadPoolExecutor the code would be:
from concurrent.futures.thread import ThreadPoolExecutor
import time
def call_script(ordinal, arg):
print('Thread', ordinal, 'argument:', arg)
time.sleep(2)
print('Thread', ordinal, 'Finished')
args = ['argumentsA', 'argumentsB', 'argumentsC']
with ThreadPoolExecutor(max_workers=2) as executor:
ordinal = 1
for arg in args:
executor.submit(call_script, ordinal, arg)
ordinal += 1
print('All tasks has been finished')
The output of the previous code is something like:
Thread 1 argument: argumentsA
Thread 2 argument: argumentsB
Thread 1 Finished
Thread 2 Finished
Thread 3 argument: argumentsC
Thread 3 Finished
All tasks has been finished
One of the advantages is that you can control the throughput setting the max concurrent workers.
To use multiprocessing instead, you can use ProcessPoolExecutor.
I prefer using list comprehension based on an input list:
inputs = [scriptA + argumentsA, scriptA + argumentsB, ...]
threads = [Thread(target=call_script, args=(i)) for i in inputs]
[t.start() for t in threads]
[t.join() for t in threads]
You can have class something like below from which you can add 'n' number of functions or console_scripts you want to execute in parallel passion and start the execution and wait for all jobs to complete..
from multiprocessing import Process
class ProcessParallel(object):
"""
To Process the functions parallely
"""
def __init__(self, *jobs):
"""
"""
self.jobs = jobs
self.processes = []
def fork_processes(self):
"""
Creates the process objects for given function deligates
"""
for job in self.jobs:
proc = Process(target=job)
self.processes.append(proc)
def start_all(self):
"""
Starts the functions process all together.
"""
for proc in self.processes:
proc.start()
def join_all(self):
"""
Waits untill all the functions executed.
"""
for proc in self.processes:
proc.join()
def two_sum(a=2, b=2):
return a + b
def multiply(a=2, b=2):
return a * b
#How to run:
if __name__ == '__main__':
#note: two_sum, multiply can be replace with any python console scripts which
#you wanted to run parallel..
procs = ProcessParallel(two_sum, multiply)
#Add all the process in list
procs.fork_processes()
#starts process execution
procs.start_all()
#wait until all the process got executed
procs.join_all()
I just came across the same problem where I needed to wait for all the threads which were created using the for loop.I just tried out the following piece of code.It may not be the perfect solution but I thought it would be a simple solution to test:
for t in threading.enumerate():
try:
t.join()
except RuntimeError as err:
if 'cannot join current thread' in err:
continue
else:
raise
From the threading module documentation
There is a “main thread” object; this corresponds to the initial
thread of control in the Python program. It is not a daemon thread.
There is the possibility that “dummy thread objects” are created.
These are thread objects corresponding to “alien threads”, which are
threads of control started outside the threading module, such as
directly from C code. Dummy thread objects have limited functionality;
they are always considered alive and daemonic, and cannot be join()ed.
They are never deleted, since it is impossible to detect the
termination of alien threads.
So, to catch those two cases when you are not interested in keeping a list of the threads you create:
import threading as thrd
def alter_data(data, index):
data[index] *= 2
data = [0, 2, 6, 20]
for i, value in enumerate(data):
thrd.Thread(target=alter_data, args=[data, i]).start()
for thread in thrd.enumerate():
if thread.daemon:
continue
try:
thread.join()
except RuntimeError as err:
if 'cannot join current thread' in err.args[0]:
# catchs main thread
continue
else:
raise
Whereupon:
>>> print(data)
[0, 4, 12, 40]
Maybe, something like
for t in threading.enumerate():
if t.daemon:
t.join()
using only join can result in false-possitive interaction with thread. Like said in docs :
When the timeout argument is present and not None, it should be a
floating point number specifying a timeout for the operation in
seconds (or fractions thereof). As join() always returns None, you
must call isAlive() after join() to decide whether a timeout happened
– if the thread is still alive, the join() call timed out.
and illustrative piece of code:
threads = []
for name in some_data:
new = threading.Thread(
target=self.some_func,
args=(name,)
)
threads.append(new)
new.start()
over_threads = iter(threads)
curr_th = next(over_threads)
while True:
curr_th.join()
if curr_th.is_alive():
continue
try:
curr_th = next(over_threads)
except StopIteration:
break