Let's say I have two types of threads,
single thread that run every x min. let's call it A thread
multi threads run all the time. B threads when A thread do_something() i want All B threads to wait till A finish then resume them. i can't figure it out what to use.
I try to use threading.Condition, wait()/notifyAll() but it did not work as I want. once i put Condition in, it process 1 by 1 like synco threads or something. I want them to run freely.
This is the sample code I try to put them wait(), then notify them but it do 1 by 1 like join(). No idea what to us.
class ...
check = True
def xxx(self,g,con):
for i in range(3):
with con:
if self.check:
con.wait()
self.check = False
time.sleep(3)
print(g)
con = threading.Condition()
threading.Thread(target=xxx,args=('a',con,)).start()
threading.Thread(target=xxx,args=('b',con,)).start()
threading.Thread(target=xxx,args=('c',con,)).start()
time.sleep(2)
con.notifyAll()
Question: Blocking other Threads while one Thread is running
Instead of using threading.Condition(), this example uses threading.Barrier(...).
Used modules from docs.python.org:
module-threading
event-objects
barrier-objects
import time, threading
from threading import BrokenBarrierError
def worker_A(g, terminate, barrier):
# Counter to simulate conditional workload
do_something = 3
while not terminate.is_set():
if do_something == 0:
# Reset the barrier and wait until n_waiting == 2
barrier.reset()
while not terminate.is_set() and barrier.n_waiting < 2:
time.sleep(0.5)
# Now the other Threads waiting at the barrier
# Simulate worklaod ...
print('worker_A barrier.broken={} n_waiting={}'
.format(barrier.broken, barrier.n_waiting))
time.sleep(3)
# Call the third barrier.wait to release the barrier
try:
barrier.wait()
except BrokenBarrierError:
pass
# Reset counter to restart simulate conditional workload
do_something = 3
else:
# Count down and give the other threads a timeslice
do_something -= 1
time.sleep(0.5)
def worker_B(g, terminate, barrier):
while not terminate.is_set():
# Simulate workload ...
print('worker_B({})'.format(g))
time.sleep(1)
# Block at barrier.wait() if the barrier is NOT in the broken state
try:
barrier.wait()
except BrokenBarrierError:
pass
if __name__ == "__main__":
# Event to terminate all Threads save
terminate = threading.Event()
# Barrier to block worker_B Threads
# We use 3 Threads, therefore init with parties=3
barrier = threading.Barrier(3)
barrier.abort()
# Create and start the Threads
threads = []
for t in [(worker_A, 'a'), (worker_B, 'b'), (worker_B, 'c'), ]:
threads.append(threading.Thread(target=t[0], args=(t[1], terminate, barrier,)))
threads[-1].start()
time.sleep(0.2)
# Simulating MAIN Thread
time.sleep(20)
# Set the `terminate` Event to True,
# and abort the barrier to force all Threads to terminate
print('Terminate...')
terminate.set()
barrier.abort()
# Wait until all Threads terminated
for t in threads:
t.join()
print('EXIT MAIN')
Tested with Python: 3.5
Related
I'm using multiprocessing to run workers on different files in parallel. Worker's results are put into queue. A listener gets the results from the queue and writes them to the file.
Sometimes listener might run into errors (of various origins). In this case, the listener silently dies, but all other processes continue running (rather surprisingly, worker errors causes all processes to terminate).
I would like to stop all processes (workers, listener, e.t.c.) when listener catches an error. How this can be done?
The scheme of my code is as follows:
def worker(file_path, q):
## do something
q.put(1.)
return True
def listener(q):
while True:
m = q.get()
if m == 'kill':
break
else:
try:
# do something and write to file
except Exception as err:
# raise error
tb = sys.exc_info()[2]
raise err.with_traceback(tb)
def main():
manager = mp.Manager()
q = manager.Queue(maxsize=3)
with mp.Pool(5) as pool:
watcher = pool.apply_async(listener, (q,))
files = ['path_1','path_2','path_3']
jobs = [ pool.apply_async(worker, (p,q,)) for p in files ]
# fire off workers
for job in jobs:
job.get()
# kill the listener when done
q.put('kill')
# run
if __name__ == "__main__":
main()
I tried introducing event = manager.Event() and using it as a flag in main():
## inside the pool, after starting workers
while True:
if event.is_set():
for job in jobs:
job.terminate()
No success. Calling os._exit(1) in listener exception block rises broken pipe error, but processes are not killed.
I also tried setting daemon = True,
for job in jobs:
job.daemon = True
Did not help.
In fact, to handle listener exceptions, I'm using a callable, as required by apply_async (so that they are not entirely silenced). This complicates the situation, but not much.
Thank you in advance.
As always there are many ways to accomplish what you're after, but I would probably suggest using an Event to signal that the processes should quit. I also would not use a Pool in this instance, as it only really simplifies things for simple cases where you need something like map. More complicated use cases quickly make it easier to just build you own "pool" with the functionality you need.
from multiprocessing import Process, Queue, Event
from random import random
def might_fail(a):
assert(a > .001)
def worker(args_q: Queue, result_q: Queue, do_quit: Event):
try:
while not do_quit.is_set():
args = args_q.get()
if args is None:
break
else:
# do something
result_q.put(random())
finally: #signal that worker is exiting even if exception is raised
result_q.put(None) #signal listener that worker is exiting
def listener(result_q: Queue, do_quit: Event, n_workers: int):
n_completed = 0
while n_workers > 0:
res = result_q.get()
if res is None:
n_workers -= 1
else:
n_completed += 1
try:
might_fail(res)
except:
do_quit.set() #let main continue
print(n_completed)
raise #reraise error after we signal others to stop
do_quit.set() #let main continue
print(n_completed)
if __name__ == "__main__":
args_q = Queue()
result_q = Queue()
do_quit = Event()
n_workers = 4
listener_p = Process(target=listener, args=(result_q, do_quit, n_workers))
listener_p.start()
for _ in range(n_workers):
worker_p = Process(target=worker, args=(args_q, result_q, do_quit))
worker_p.start()
for _ in range(1000):
args_q.put("some/file.txt")
for _ in range(n_workers):
args_q.put(None)
do_quit.wait()
print('done')
I am working with python 3.6 trying to build a set of sub threads that die when the father dies, without the father being the main thread.
Here mi question:
Im needing to kill a set of threads when the function from which I fired them end.
I currently do that by launching all using daemon = True and when I finish the main thread all the sons die.
But now I need that functionality from a secondary thread. That is, all children die when the secondary thread end.
Need it whitout doing manual coding. No flag or something like that because the functions ar already write.
I try using ThreadPoolExecutor but do not achieve anything :(
Here a litle example:
Thank a lot!!!
import time
import threading
class FooHilo:
def print_while(self):
a = 0
while True:
print('im still alive')
time.sleep(1)
def main_control(self):
t = threading.Thread(target=self.print_while)
t.daemon = True
t.start()
b = 0
while b < 5:
b += 1
print(b)
time.sleep(1)
print('Exit... wishing that the thread print_while also end... ')
def start(self):
t = threading.Thread(target=self.main_control)
t.daemon = False
t.start()
foo = FooHilo()
foo.start()
while True:
a = input('Input anything:')
break
print('END...')
Currently I have 3 Process A,B,C created under main process. However, I would like to start B and C in Process A. Is that possible?
process.py
from multiprocessing import Process
procs = {}
import time
def test():
print(procs)
procs['B'].start()
procs['C'].start()
time.sleep(8)
procs['B'].terminate()
procs['C'].termiante()
procs['B'].join()
procs['C'].join()
def B():
while True:
print('+'*10)
time.sleep(1)
def C():
while True:
print('-'*10)
time.sleep(1)
procs['A'] = Process(target = test)
procs['B'] = Process(target = B)
procs['C'] = Process(target = C)
main.py
from process import *
print(procs)
procs['A'].start()
procs['A'].join()
And I got error
AssertionError: can only start a process object created by current process
Are there any alternative way to start process B and C in A? Or let A send signal to ask master process start B and C
I would recommend using Event objects to do the synchronization. They permit to trigger some actions across the processes. For instance
from multiprocessing import Process, Event
import time
procs = {}
def test():
print(procs)
# Will let the main process know that it needs
# to start the subprocesses
procs['B'][1].set()
procs['C'][1].set()
time.sleep(3)
# This will trigger the shutdown of the subprocess
# This is cleaner than using terminate as it allows
# you to clean up the processes if needed.
procs['B'][1].set()
procs['C'][1].set()
def B():
# Event will be set once again when this process
# needs to finish
event = procs["B"][1]
event.clear()
while not event.is_set():
print('+' * 10)
time.sleep(1)
def C():
# Event will be set once again when this process
# needs to finish
event = procs["C"][1]
event.clear()
while not event.is_set():
print('-' * 10)
time.sleep(1)
if __name__ == '__main__':
procs['A'] = (Process(target=test), None)
procs['B'] = (Process(target=B), Event())
procs['C'] = (Process(target=C), Event())
procs['A'][0].start()
# Wait for events to be set before starting the subprocess
procs['B'][1].wait()
procs['B'][0].start()
procs['C'][1].wait()
procs['C'][0].start()
# Join all the subprocess in the process that created them.
procs['A'][0].join()
procs['B'][0].join()
procs['C'][0].join()
note that this code is not really clean. Only one event is needed in this case. But you should get the main idea.
Also, the process A is not needed anymore, you could consider using callbacks instead. See for instance the concurrent.futures module if you want to chain some async actions.
This may have been asked in a similar context but I was unable to find an answer after about 20 minutes of searching, so I will ask.
I have written a Python script (lets say: scriptA.py) and a script (lets say scriptB.py)
In scriptB I want to call scriptA multiple times with different arguments, each time takes about an hour to run, (its a huge script, does lots of stuff.. don't worry about it) and I want to be able to run the scriptA with all the different arguments simultaneously, but I need to wait till ALL of them are done before continuing; my code:
import subprocess
#setup
do_setup()
#run scriptA
subprocess.call(scriptA + argumentsA)
subprocess.call(scriptA + argumentsB)
subprocess.call(scriptA + argumentsC)
#finish
do_finish()
I want to do run all the subprocess.call() at the same time, and then wait till they are all done, how should I do this?
I tried to use threading like the example here:
from threading import Thread
import subprocess
def call_script(args)
subprocess.call(args)
#run scriptA
t1 = Thread(target=call_script, args=(scriptA + argumentsA))
t2 = Thread(target=call_script, args=(scriptA + argumentsB))
t3 = Thread(target=call_script, args=(scriptA + argumentsC))
t1.start()
t2.start()
t3.start()
But I do not think this is right.
How do I know they have all finished running before going to my do_finish()?
Put the threads in a list and then use the Join method
threads = []
t = Thread(...)
threads.append(t)
...repeat as often as necessary...
# Start all threads
for x in threads:
x.start()
# Wait for all of them to finish
for x in threads:
x.join()
You need to use join method of Thread object in the end of the script.
t1 = Thread(target=call_script, args=(scriptA + argumentsA))
t2 = Thread(target=call_script, args=(scriptA + argumentsB))
t3 = Thread(target=call_script, args=(scriptA + argumentsC))
t1.start()
t2.start()
t3.start()
t1.join()
t2.join()
t3.join()
Thus the main thread will wait till t1, t2 and t3 finish execution.
In Python3, since Python 3.2 there is a new approach to reach the same result, that I personally prefer to the traditional thread creation/start/join, package concurrent.futures: https://docs.python.org/3/library/concurrent.futures.html
Using a ThreadPoolExecutor the code would be:
from concurrent.futures.thread import ThreadPoolExecutor
import time
def call_script(ordinal, arg):
print('Thread', ordinal, 'argument:', arg)
time.sleep(2)
print('Thread', ordinal, 'Finished')
args = ['argumentsA', 'argumentsB', 'argumentsC']
with ThreadPoolExecutor(max_workers=2) as executor:
ordinal = 1
for arg in args:
executor.submit(call_script, ordinal, arg)
ordinal += 1
print('All tasks has been finished')
The output of the previous code is something like:
Thread 1 argument: argumentsA
Thread 2 argument: argumentsB
Thread 1 Finished
Thread 2 Finished
Thread 3 argument: argumentsC
Thread 3 Finished
All tasks has been finished
One of the advantages is that you can control the throughput setting the max concurrent workers.
To use multiprocessing instead, you can use ProcessPoolExecutor.
I prefer using list comprehension based on an input list:
inputs = [scriptA + argumentsA, scriptA + argumentsB, ...]
threads = [Thread(target=call_script, args=(i)) for i in inputs]
[t.start() for t in threads]
[t.join() for t in threads]
You can have class something like below from which you can add 'n' number of functions or console_scripts you want to execute in parallel passion and start the execution and wait for all jobs to complete..
from multiprocessing import Process
class ProcessParallel(object):
"""
To Process the functions parallely
"""
def __init__(self, *jobs):
"""
"""
self.jobs = jobs
self.processes = []
def fork_processes(self):
"""
Creates the process objects for given function deligates
"""
for job in self.jobs:
proc = Process(target=job)
self.processes.append(proc)
def start_all(self):
"""
Starts the functions process all together.
"""
for proc in self.processes:
proc.start()
def join_all(self):
"""
Waits untill all the functions executed.
"""
for proc in self.processes:
proc.join()
def two_sum(a=2, b=2):
return a + b
def multiply(a=2, b=2):
return a * b
#How to run:
if __name__ == '__main__':
#note: two_sum, multiply can be replace with any python console scripts which
#you wanted to run parallel..
procs = ProcessParallel(two_sum, multiply)
#Add all the process in list
procs.fork_processes()
#starts process execution
procs.start_all()
#wait until all the process got executed
procs.join_all()
I just came across the same problem where I needed to wait for all the threads which were created using the for loop.I just tried out the following piece of code.It may not be the perfect solution but I thought it would be a simple solution to test:
for t in threading.enumerate():
try:
t.join()
except RuntimeError as err:
if 'cannot join current thread' in err:
continue
else:
raise
From the threading module documentation
There is a “main thread” object; this corresponds to the initial
thread of control in the Python program. It is not a daemon thread.
There is the possibility that “dummy thread objects” are created.
These are thread objects corresponding to “alien threads”, which are
threads of control started outside the threading module, such as
directly from C code. Dummy thread objects have limited functionality;
they are always considered alive and daemonic, and cannot be join()ed.
They are never deleted, since it is impossible to detect the
termination of alien threads.
So, to catch those two cases when you are not interested in keeping a list of the threads you create:
import threading as thrd
def alter_data(data, index):
data[index] *= 2
data = [0, 2, 6, 20]
for i, value in enumerate(data):
thrd.Thread(target=alter_data, args=[data, i]).start()
for thread in thrd.enumerate():
if thread.daemon:
continue
try:
thread.join()
except RuntimeError as err:
if 'cannot join current thread' in err.args[0]:
# catchs main thread
continue
else:
raise
Whereupon:
>>> print(data)
[0, 4, 12, 40]
Maybe, something like
for t in threading.enumerate():
if t.daemon:
t.join()
using only join can result in false-possitive interaction with thread. Like said in docs :
When the timeout argument is present and not None, it should be a
floating point number specifying a timeout for the operation in
seconds (or fractions thereof). As join() always returns None, you
must call isAlive() after join() to decide whether a timeout happened
– if the thread is still alive, the join() call timed out.
and illustrative piece of code:
threads = []
for name in some_data:
new = threading.Thread(
target=self.some_func,
args=(name,)
)
threads.append(new)
new.start()
over_threads = iter(threads)
curr_th = next(over_threads)
while True:
curr_th.join()
if curr_th.is_alive():
continue
try:
curr_th = next(over_threads)
except StopIteration:
break
I would like my while loop to block at most 5 seconds for all threads it creates in the for loop. However, the following code will block by the threads one by one. How can I approach my goal? Thanks.
threads = []
while True:
for 3:
newThread = threading.Thread(..)
threads.append(newThread)
newThread.start()
newThread.join(5)
You need to use condition variable (threading.Condition in Python). It allows to wait for a predicate to become true. In your case the predicate is all threads have finished work or time out exceeded. Here is code which creates ten threads and waits until they are finished with 5sec time out. Verbose logs will help you:
import threading
import time
import logging
logging.basicConfig(
format='%(threadName)s:%(message)s',
level=logging.DEBUG,
)
NUM_OF_THREADS = 10
TIMEOUT = 5
def sleeping_thread(delay, cond):
logging.debug("Hi, I'm going to delay by %d sec." % delay)
time.sleep(delay)
logging.debug("I was sleeping for %d sec." % delay)
cond.acquire()
logging.debug("Calling notify().")
cond.notify()
cond.release()
def create_sleeping_thread(delay, cond):
return threading.Thread(target=sleeping_thread,
args=(delay, cond))
if __name__ == '__main__':
cond = threading.Condition(threading.Lock())
cond.acquire()
working_counter = NUM_OF_THREADS
for i in xrange(NUM_OF_THREADS):
t = create_sleeping_thread(i, cond)
t.start()
start_time = time.time()
while working_counter > 0 and (time.time() - start_time < TIMEOUT):
cond.wait()
working_counter -= 1
logging.debug('%d workers still working', working_counter)
cond.release()
logging.debug('Finish waiting for threads (%d workers still working)',
working_counter)
Further information at comp.programming.threads FAQ.
One thing to do is start all the threads, and then iterate over the array and join. But I suppose, this would still wait up to a total of 5*thread count seconds. Alternatively, you could create one additional thread that simply waits for your threads indefinitely. Then in your main thread you can just wait for the extra thread for 5 seconds.
Are you trying to spawn a thread every 5 seconds, except if one of the already-running threads ends, you wish to spawn a new thread sooner? If so, you could use a threading.Event to signal when a worker thread ends, and use event.wait(timeout) to block at most 5 seconds for the event:
import threading
import time
import logging
logger=logging.getLogger(__name__)
logging.basicConfig(level=logging.DEBUG,
format='%(asctime)s: %(message)s',
datefmt='%H:%M:%S')
def foo_event(n,e):
time.sleep(n)
name=threading.current_thread().name
logger.info('{n}: setting event'.format(n=name))
e.set()
def main():
e=threading.Event()
threads=[]
N=5
for i in range(3):
t=threading.Thread(target=foo_event,args=(N+1,e,),name='worker-{i}'.format(i=i))
threads.append(t)
t.daemon=True
t.start()
logger.info('entering wait')
e.wait(N)
logger.info('exit wait')
e.clear()
main()
yields
05:06:34: entering wait
05:06:39: exit wait <-- Wait 5 seconds
05:06:39: entering wait
05:06:40: worker-0: setting event
05:06:40: exit wait <-- Wait <5 seconds
05:06:40: entering wait
05:06:45: worker-1: setting event
05:06:45: exit wait <-- Wait 5 seconds