I am testing a method to run several tasks in parallel. These tasks will run in parallel threads and I want the tasks to repeat until a global variable is set. I am first trying threading to launch the parallel threads, and make sure they will work properly. What I have so far:
import threading
from IPython.display import clear_output
import time
i = 0
j = 0
def main():
global i
global j
t1 = threading.Thread(name = "task1", target = task1)
t2 = threading.Thread(name = "task2", target = task2)
t1.start()
t2.start()
def task1():
global i
i += 1
time.sleep(10)
t1 = threading.Thread(name = "task1", target = task1)
t1.start()
def task2():
global j
j -= 1
time.sleep(10)
t2 = threading.Thread(name = "task2", target = task2)
t2.start()
tmain = threading.Thread(name = "main", target = main)
tmain.start()
which starts a main thread that then starts two threads which run task1 and task2. To monitor the current threads and the values of i and j I run:
while(True):
clear_output(wait=True)
for thread in threading.enumerate():
print(thread)
print(i)
print(j)
time.sleep(0.1)
(all of this is being run in a Jupyter Notebook).
Running the script above, i noticed some unexpected results. I expect that at any given time, there should be at most two threads of task1 and task2, but instead I observe many more threads of task2 compared to task1. These are not ghost or finished threads, because the absolute values of i and j grow disproportionately. Two observations I made:
Again, i expect that there should be a symmetric number of threads for both task1 and task 2, and I also expect that the abslute values of i and j should grow more proportionately than they are. Any insight on how to mitigate this discrepancy or avoid this issue would be appreciated.
I ran your code in Jupyter and didn't have your problem.
<_MainThread(MainThread, started 139735228168000)>
<Thread(Thread-1, started daemon 139735083251456)>
<Heartbeat(Thread-2, started daemon 139735074858752)>
<HistorySavingThread(IPythonHistorySavingThread, started 139735049680640)>
<Thread(task2, started 139734638634752)>
<Thread(task1, started 139734680598272)>
<Thread(task2, started 139735041287936)>
<Thread(task1, started 139734076618496)>
<Thread(task1, started 139735032895232)>
<Thread(task2, started 139734672205568)>
<Thread(task1, started 139734655420160)>
<Thread(task2, started 139734630242048)>
272
-272
But as you already saw with your own code, there are multiple instances of each task running. So after a task 'has started itself anew' it takes some time before it kills itself.
A solution to your Jupyter problem could be to give the main function the control of restarting a killed tasked. This ensures that always only 1 thread of each task is running.
import threading
from IPython.display import clear_output
import time
i = 0
j = 0
main_stop = False
def task1():
global i
i += 1
time.sleep(4)
def task2():
global j
j -= 1
time.sleep(4)
def main():
global i
global j
t1 = threading.Thread(name="task1", target=task1)
t2 = threading.Thread(name="task2", target=task2)
t1.start()
t2.start()
while not main_stop:
if not t1.is_alive():
del t1
t1 = threading.Thread(name="task1", target=task1)
t1.start()
if not t2.is_alive():
del t2
t2 = threading.Thread(name="task2", target=task2)
t2.start()
# wait for tasks to complete
while t1.is_alive():
time.sleep(0.1)
while t2.is_alive():
time.sleep(0.1)
tmain = threading.Thread(name="main", target=main)
tmain.start()
run_time = 30 # seconds
end_time = time.time() + run_time
while time.time() < end_time:
clear_output(wait=True)
for thread in threading.enumerate():
print(thread)
print(i)
print(j)
time.sleep(0.1)
main_stop = True
# wait for main to complete
while tmain.is_alive():
time.sleep(0.1)
print('program completed')
Related
Hi I am trying to make it so 2 threads will change the other one but I can't figure it out this is an example of what I have
Import time
Import threading
s=0
def thing1():
time.sleep(1)
s+=1
def thing2():
print(s)
t = threading.Thread(target = thing1)
t.start()
t2 = threading.Thread(target = thing2)
t2.start()
When they run thing2 will print 0, not the seconds. I have it so they run later this is just all the code that's necessary
You need to use a semaphore so that each thread is not accessing the variable at the same time. However, any two threads can access the same variable s using global.
import threading
import time
s = 0
sem = threading.Semaphore()
def thing1():
global s
for _ in range(3):
time.sleep(1)
sem.acquire()
s += 1
sem.release()
def thing2():
global s
for _ in range(3):
time.sleep(1)
sem.acquire()
print(s)
sem.release()
t = threading.Thread(target = thing1)
t.start()
t2 = threading.Thread(target = thing2)
t2.start()
I run some jobs in parallel, which can sometime take a long time, so I want the main thread to report on the progress. For example, each hour.
Below is the simplified version of what I came up with. The code will run test_function in 2 threads with arguments from input_arguments. Every 5 seconds it will print % of the jobs finished.
import threading
import queue
import time
def test_function(x):
time.sleep(4)
print("Finished ", x)
num_processes = 2
input_arguments = range(10)
# Define a worker which will continuously execute function taking input parameters from the queue
def worker():
while True:
x = q.get()
if x is None:
break
test_function(x)
q.task_done()
# Initialize queue and the threads
q = queue.Queue()
threads = []
for i in range(num_processes):
t = threading.Thread(target=worker)
t.start()
threads.append(t)
# Create a queue of input parameters for function
for item in input_arguments:
q.put(item)
# Report progress every 5 seconds
report_progress(q)
# stop workers
for i in range(num_processes):
q.put(None)
for t in threads:
t.join()
Where report_progress is defined as following
def report_progress(q):
qsize_init = q.qsize()
while not q.empty():
time.sleep(5)
portion_finished = 1 - q.qsize() / qsize_init
print("run_parallel: {:.1%} jobs are finished".format(portion_finished))
However, I want to report the progress every hour instead of 5 seconds, and if all jobs are finished, the program might just be idle for many minutes.
Another possibility is to define report_progress differently:
def report_progress(q):
qsize_init = q.qsize()
time_start = time.time()
while not q.empty():
current_time = time.time()
if current_time - time_start > 5:
portion_finished = 1 - q.qsize() / qsize_init
print("run_parallel: {:.1%} jobs are finished".format(portion_finished))
time_start = time.time()
I am worried that constantly checking this condition will drain CPU resources, small portion, but on a scale of hours it could be a lot.
Is there a standard way of handling this?
Python: 3.6
For now I will use a simple solution, suggested in the comments by #Andriy Maletsky.
Main thread will check every few seconds if the q is not empty yet, and it will print a progress message if it has past more than 1 hour since the last report.
time_between_reports = 3600
time_between_checks = 5
def report_progress_until_finished(q):
qsize_init = q.qsize()
last_report_time = time.time()
while not q.empty():
time_elapsed = time.time() - last_report_time
if time_elapsed > time_between_reports:
portion_finished = 1 - q.qsize() / qsize_init
print("run_parallel: {:.1%} jobs are finished".format(portion_finished))
last_report_time = time.time()
time.sleep(time_between_checks)
I'm learning about concurrent programming with Python.
In the following code, I seem to be having synchronizing issues. How can I fix it?
import threading
N = 1000000
counter = 0
def increment():
global counter
for i in range(N):
counter += 1
t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)
t1.start()
t2.start()
t1.join()
t2.join()
print(counter)
Both threads are trying to modify counter at the same time, and sometimes they do. That results in some of the increments not appearing. Here is a simple minded approach to solve that problem using threading.Lock:
import threading
N = 1000000
counter = 0
def increment(theLock):
global counter
for i in range(N):
theLock.acquire()
counter += 1
theLock.release()
lock = threading.Lock()
t1 = threading.Thread(target=increment, args=[lock,])
t2 = threading.Thread(target=increment, args=[lock,])
t1.start()
t2.start()
t1.join()
t2.join()
print(counter)
The theLock.acquire() and theLock.release() surround code that must be protected to only run in one thread at a time. In your example the acquire and release could also surround the entire loop, but that would be the same as not using multiprocessing. See the threading documentation and in particular, the Lock Objects section.
I need to start two threads, controlling which one starts first, then having them alternating their jobs.
The following code works as expected with do_sleep = True, but it can fail with do_sleep = False.
How can I achieve the same result without using those ugly (and unreliable) sleeps?
The reason why it works with do_sleep = True is that:
Each worker thread gives time to the other thread to start before trying to acquire the lock and start the next job
There is a pause between the start of the first and the second worker that allows the first one to acquire the lock before the second is ready
With do_sleep = False it can fail because:
At the end of each job, each thread can try to acquire the lock for the next cycle before the other thread, executing two consecutive jobs instead of alternating
The second thread could acquire the lock before the first one
Here is the code:
import threading
import time
import random
do_sleep = True
def workerA(lock):
for i in range(5):
lock.acquire()
print('Working A - %s' % i)
time.sleep(random.uniform(0.2, 1))
lock.release()
if do_sleep: time.sleep(0.1)
def workerB(lock):
for i in range(5):
if do_sleep: time.sleep(0.1)
lock.acquire()
print('Working B - %s' % i)
time.sleep(random.uniform(0.2, 1))
lock.release()
if do_sleep: time.sleep(0.1)
lock = threading.Lock()
t1 = threading.Thread(target=workerA, args=(lock, ))
t2 = threading.Thread(target=workerB, args=(lock, ))
t1.start()
if do_sleep: time.sleep(0.1)
t2.start()
t1.join()
t2.join()
print('done')
EDIT
Using a Queue as suggested by Mike doesn't help, because the first worker would finish the job without waiting for the second.
This is the wrong output of a version after replacing the Lock with a Queue:
Working A - 0
Working A - 1
Working B - 0
Working A - 2
Working B - 1
Working A - 3
Working B - 2
Working A - 4
Working B - 3
Working B - 4
done
This is the wrong output, obtained with do_sleep = False:
Working A - 0
Working A - 1
Working A - 2
Working A - 3
Working A - 4
Working B - 0
Working B - 1
Working B - 2
Working B - 3
Working B - 4
done
This is the correct output, obtained with do_sleep = True:
Working A - 0
Working B - 0
Working A - 1
Working B - 1
Working A - 2
Working B - 2
Working A - 3
Working B - 3
Working A - 4
Working B - 4
done
Several ways to solve this. One relatively easy one is to use the lock to control access to a separate shared variable: call this other variable owner, it can either be set to A or B. Thread A can only start a job when owner is set to A, and thread B can only start a job when owner is set to B. Then the pseudo-code is (assume thread A here):
while True:
while True:
# Loop until I'm the owner
lock.acquire()
if owner == A:
break
lock.release()
# Now I'm the owner. And I still hold the lock. Start job.
<Grab next job (or start job or finish job, whatever is required to remove it from contention)>
owner = B
lock.release()
<Finish job if not already done. Go get next one>
The B thread does the same thing only reversing the if owner and owner = statements. And obviously you can parameterize it so that both actually just run the same code.
EDIT
Here is the working version, with the suggested logic inside an object:
import threading
import time
def workerA(lock):
for i in range(5):
lock.acquire_for('A')
print('Start A - %s' % i)
time.sleep(0.5)
print('End A - %s' % i)
lock.release_to('B')
def workerB(lock):
for i in range(5):
lock.acquire_for('B')
print('Start B - %s' % i)
time.sleep(2)
print('End B - %s' % i)
lock.release_to('A')
class LockWithOwner:
lock = threading.RLock()
owner = 'A'
def acquire_for(self, owner):
n = 0
while True:
self.lock.acquire()
if self.owner == owner:
break
n += 1
self.lock.release()
time.sleep(0.001)
print('Waited for {} to be the owner {} times'.format(owner, n))
def release_to(self, new_owner):
self.owner = new_owner
self.lock.release()
lock = LockWithOwner()
lock.owner = 'A'
t1 = threading.Thread(target=workerA, args=(lock, ))
t2 = threading.Thread(target=workerB, args=(lock, ))
t1.start()
t2.start()
t1.join()
t2.join()
print('done')
You can exclude the possibility of the wrong thread acquiring the lock, exclude relying on time.sleep(...) for correctness and shorten your code at the same time using Queue (two queues for both way communication):
import threading
import time
import random
from Queue import Queue
def work_hard(name, i):
print('start %s - %s' % (name, i))
time.sleep(random.uniform(0.2, 1))
print('end %s - %s' % (name, i))
def worker(name, q_mine, q_his):
for i in range(5):
q_mine.get()
work_hard(name, i)
q_his.put(1)
qAB = Queue()
qBA = Queue()
t1 = threading.Thread(target=worker, args=('A', qAB, qBA))
t2 = threading.Thread(target=worker, args=('B', qBA, qAB))
t1.start()
qAB.put(1) # notice how you don't need time.sleep(...) even here
t2.start()
t1.join()
t2.join()
print('done')
It works as you specified. Alternatively you can use threading.Condition (a combination of acquire, release, wait and notify/notifyAll), but that will be more subtle, especially in terms of which thread goes first.
I have tried Gil Hamilton's answer and it doesn't work for me if I remove all the sleeps. I think it's because my 'main' thread keeps getting the priority. I found out that a better way to synchronize two or more threads is to use conditional object.
Here is my working alternate lock object with conditional object inside
class AltLock():
def __init__(self, initial_thread):
self.allow = initial_thread
self.cond = threading.Condition()
def acquire_for(self, thread):
self.cond.acquire()
while self.allow!=thread:
print("\tLOCK:", thread, "waiting")
self.cond.wait()
print("\tLOCK:", thread, "acquired")
def release_to(self, thread):
print("\tLOCK: releasing to", thread)
self.allow=thread
self.cond.notifyAll()
self.cond.release()
And this is an example usecase (the sleep statements in the thread are not required):
class MyClass():
def __init__(self):
self.lock = AltLock("main")
def _start(self):
print("thread: Started, wait 2 second")
time.sleep(2)
print("---")
self.lock.acquire_for("thread")
time.sleep(2)
print("---")
print("thread: start lock acquired")
self.lock.release_to("main")
return 0
def start(self):
self.lock.acquire_for("main")
self.thread = threading.Thread(target = self._start, )
self.thread.start()
print("main: releasing lock")
self.lock.release_to("thread")
self.lock.acquire_for("main")
print("main: lock acquired")
myclass = MyClass()
myclass.start()
myclass.lock.release_to("main") # house keeping
And this is stdout:
LOCK: main acquired
thread: Started, wait 2 second
main: releasing lock
LOCK: releasing to thread
LOCK: main waiting // 'main' thread try to reacquire the lock immediately but get blocked by wait.
---
LOCK: thread acquired
---
thread: start lock acquired
LOCK: releasing to main
LOCK: main acquired
main: lock acquired
LOCK: releasing to main
I've started programming in Python a few weeks ago and was trying to use Semaphores to synchronize two simple threads, for learning purposes. Here is what I've got:
import threading
sem = threading.Semaphore()
def fun1():
while True:
sem.acquire()
print(1)
sem.release()
def fun2():
while True:
sem.acquire()
print(2)
sem.release()
t = threading.Thread(target = fun1)
t.start()
t2 = threading.Thread(target = fun2)
t2.start()
But it keeps printing just 1's. How can I intercale the prints?
It is working fine, its just that its printing too fast for you to see . Try putting a time.sleep() in both functions (a small amount) to sleep the thread for that much amount of time, to actually be able to see both 1 as well as 2.
Example -
import threading
import time
sem = threading.Semaphore()
def fun1():
while True:
sem.acquire()
print(1)
sem.release()
time.sleep(0.25)
def fun2():
while True:
sem.acquire()
print(2)
sem.release()
time.sleep(0.25)
t = threading.Thread(target = fun1)
t.start()
t2 = threading.Thread(target = fun2)
t2.start()
Also, you can use Lock/mutex method as follows:
import threading
import time
mutex = threading.Lock() # is equal to threading.Semaphore(1)
def fun1():
while True:
mutex.acquire()
print(1)
mutex.release()
time.sleep(.5)
def fun2():
while True:
mutex.acquire()
print(2)
mutex.release()
time.sleep(.5)
t1 = threading.Thread(target=fun1).start()
t2 = threading.Thread(target=fun2).start()
Simpler style using "with":
import threading
import time
mutex = threading.Lock() # is equal to threading.Semaphore(1)
def fun1():
while True:
with mutex:
print(1)
time.sleep(.5)
def fun2():
while True:
with mutex:
print(2)
time.sleep(.5)
t1 = threading.Thread(target=fun1).start()
t2 = threading.Thread(target=fun2).start()
[NOTE]:
The difference between mutex, semaphore, and lock
In fact, I want to find asyncio.Semaphores, not threading.Semaphore,
and I believe someone may want it too.
So, I decided to share the asyncio.Semaphores, hope you don't mind.
from asyncio import (
Task,
Semaphore,
)
import asyncio
from typing import List
async def shopping(sem: Semaphore):
while True:
async with sem:
print(shopping.__name__)
await asyncio.sleep(0.25) # Transfer control to the loop, and it will assign another job (is idle) to run.
async def coding(sem: Semaphore):
while True:
async with sem:
print(coding.__name__)
await asyncio.sleep(0.25)
async def main():
sem = Semaphore(value=1)
list_task: List[Task] = [asyncio.create_task(_coroutine(sem)) for _coroutine in (shopping, coding)]
"""
# Normally, we will wait until all the task has done, but that is impossible in your case.
for task in list_task:
await task
"""
await asyncio.sleep(2) # So, I let the main loop wait for 2 seconds, then close the program.
asyncio.run(main())
output
shopping
coding
shopping
coding
shopping
coding
shopping
coding
shopping
coding
shopping
coding
shopping
coding
shopping
coding
16*0.25 = 2
I used this code to demonstrate how 1 thread can use a Semaphore and the other thread will wait (non-blocking) until the Sempahore is available.
This was written using Python3.6; Not tested on any other version.
This will only work is the synchronization is being done from the same thread, IPC from separate processes will fail using this mechanism.
import threading
from time import sleep
sem = threading.Semaphore()
def fun1():
print("fun1 starting")
sem.acquire()
for loop in range(1,5):
print("Fun1 Working {}".format(loop))
sleep(1)
sem.release()
print("fun1 finished")
def fun2():
print("fun2 starting")
while not sem.acquire(blocking=False):
print("Fun2 No Semaphore available")
sleep(1)
else:
print("Got Semphore")
for loop in range(1, 5):
print("Fun2 Working {}".format(loop))
sleep(1)
sem.release()
t1 = threading.Thread(target = fun1)
t2 = threading.Thread(target = fun2)
t1.start()
t2.start()
t1.join()
t2.join()
print("All Threads done Exiting")
When I run this - I get the following output.
fun1 starting
Fun1 Working 1
fun2 starting
Fun2 No Semaphore available
Fun1 Working 2
Fun2 No Semaphore available
Fun1 Working 3
Fun2 No Semaphore available
Fun1 Working 4
Fun2 No Semaphore available
fun1 finished
Got Semphore
Fun2 Working 1
Fun2 Working 2
Fun2 Working 3
Fun2 Working 4
All Threads done Exiting
Existing answers are wastefully sleeping
I noticed that almost all answers use some form of time.sleep or asyncio.sleep, which blocks the thread. This should be avoided in real software, because blocking your thread for 0.25, 0.5 or 1 second is unnecessary/wasteful - you could be doing more processing, especially if your application is IO bound - it already blocks when it does IO AND you are introducing arbitrary delays (latency) in your processing time. If all your threads are sleeping, your app isn't doing anything. Also, these variables are quite arbitrary, which is why each answer has a different value they sleep (block the thread for).
The answers are using it as a way to get Python's bytecode interpreter to pre-empt the thread after each print line, so that it alternates deterministically between running the 2 threads. By default, the interpreter pre-empts a thread every 5ms (sys.getswitchinterval() returns 0.005), and remember that these threads never run in parallel, because of Python's GIL
Solution to problem
How can I intercale the prints?
So my answer would be, you do not want to use semaphores to print (or process) something in a certain order reliably, because you cannot rely on thread prioritization in Python. See Controlling scheduling priority of python threads? for more. time.sleep(arbitrarilyLargeEnoughNumber) doesn't really work when you have more than 2 concurrent pieces of code, since you don't know which one will run next - see * below. If the order matters, use a queue, and worker threads:
from threading import Thread
import queue
q = queue.Queue()
def enqueue():
while True:
q.put(1)
q.put(2)
def reader():
while True:
value = q.get()
print(value)
enqueuer_thread = Thread(target = enqueue)
reader_thread_1 = Thread(target = reader)
reader_thread_2 = Thread(target = reader)
reader_thread_3 = Thread(target = reader)
enqueuer_thread.start()
reader_thread_1.start()
reader_thread_2.start()
reader_thread_3.start()
...
Unfortunately in this problem, you don't get to use Semaphore.
*An extra check for you
If you try a modification of the top voted answer but with an extra function/thread to print(3), you'll get:
1
2
3
1
3
2
1
3
...
Within a few prints, the ordering is broken - it's 1-3-2.
You need to use 2 semaphores to do what you want to do, and you need to initialize them at 0.
import threading
SEM_FUN1 = threading.Semaphore(0)
SEM_FUN2 = threading.Semaphore(0)
def fun1() -> None:
for _ in range(5):
SEM_FUN1.acquire()
print(1)
SEM_FUN2.release()
def fun2() -> None:
for _ in range(5):
SEM_FUN2.acquire()
print(2)
SEM_FUN1.release()
threading.Thread(target=fun1).start()
threading.Thread(target=fun2).start()
SEM_FUN1.release() # Trigger fun1
Output: