I'm running a thread pool that is giving a random bug. Sometimes it works, sometimes it gets stuck at the pool.join part of this code. I've been at this several days, yet cannot find any difference between when it works or when it gets stuck. Please help...
Here's the code...
def run_thread_pool(functions_list):
# Make the Pool of workers
pool = ThreadPool() # left blank to default to machine number of cores
pool.map(run_function, functions_list)
# close the pool and wait for the work to finish
pool.close()
pool.join()
return
Similarly, this code is also randomly getting stuck at q.join(:
def run_queue_block(methods_list, max_num_of_workers=20):
from views.console_output_handler import add_to_console_queue
'''
Runs methods on threads. Stores method returns in a list. Then outputs that list
after all methods in the list have been completed.
:param methods_list: example ((method name, args), (method_2, args), (method_3, args)
:param max_num_of_workers: The number of threads to use in the block.
:return: The full list of returns from each method.
'''
method_returns = []
log = StandardLogger(logger_name='run_queue_block')
# lock to serialize console output
lock = threading.Lock()
def _output(item):
# Make sure the whole print completes or threads can mix up output in one line.
with lock:
if item:
add_to_console_queue(item)
msg = threading.current_thread().name, item
log.log_debug(msg)
return
# The worker thread pulls an item from the queue and processes it
def _worker():
log = StandardLogger(logger_name='_worker')
while True:
try:
method, args = q.get() # Extract and unpack callable and arguments
except:
# we've hit a nonetype object.
break
if method is None:
break
item = method(*args) # Call callable with provided args and store result
method_returns.append(item)
_output(item)
q.task_done()
num_of_jobs = len(methods_list)
if num_of_jobs < max_num_of_workers:
max_num_of_workers = num_of_jobs
# Create the queue and thread pool.
q = Queue()
threads = []
# starts worker threads.
for i in range(max_num_of_workers):
t = threading.Thread(target=_worker)
t.daemon = True # thread dies when main thread (only non-daemon thread) exits.
t.start()
threads.append(t)
for method in methods_list:
q.put(method)
# block until all tasks are done
q.join()
# stop workers
for i in range(max_num_of_workers):
q.put(None)
for t in threads:
t.join()
return method_returns
I never know when it's going to work. It works most the time, but most the time is not good enough. What might possibly cause a bug like this?
You have to call shutdown on the concurrent.futures.ThreadPoolExecutor object. Then return the result of pool.map.
def run_thread_pool(functions_list):
# Make the Pool of workers
pool = ThreadPool() # left blank to default to machine number of cores
result = pool.map(run_function, functions_list)
# close the pool and wait for the work to finish
pool.shutdown()
return result
I've simplified your code without a Queue object and daemon Thread. Check if it fits your requirement.
def run_queue_block(methods_list):
from views.console_output_handler import add_to_console_queue
'''
Runs methods on threads. Stores method returns in a list. Then outputs that list
after all methods in the list have been completed.
:param methods_list: example ((method name, args), (method_2, args), (method_3, args)
:param max_num_of_workers: The number of threads to use in the block.
:return: The full list of returns from each method.
'''
method_returns = []
log = StandardLogger(logger_name='run_queue_block')
# lock to serialize console output
lock = threading.Lock()
def _output(item):
# Make sure the whole print completes or threads can mix up output in one line.
with lock:
if item:
add_to_console_queue(item)
msg = threading.current_thread().name, item
log.log_debug(msg)
return
# The worker thread pulls an item from the queue and processes it
def _worker(method, *args, **kwargs):
log = StandardLogger(logger_name='_worker')
item = method(*args, **kwargs) # Call callable with provided args and store result
with lock:
method_returns.append(item)
_output(item)
threads = []
# starts worker threads.
for method, args in methods_list:
t = threading.Thread(target=_worker, args=(method, args))
t.start()
threads.append(t)
# stop workers
for t in threads:
t.join()
return method_returns
To allow your queue to join in your second example, you need to ensure that all tasks are removed from the queue.
So in your _worker function, mark tasks as done even if they could not be processed, otherwise the queue will never be emptied, and your program will hang.
def _worker():
log = StandardLogger(logger_name='_worker')
while True:
try:
method, args = q.get() # Extract and unpack callable and arguments
except:
# we've hit a nonetype object.
q.task_done()
break
if method is None:
q.task_done()
break
item = method(*args) # Call callable with provided args and store result
method_returns.append(item)
_output(item)
q.task_done()
Related
I'm using multiprocessing to run workers on different files in parallel. Worker's results are put into queue. A listener gets the results from the queue and writes them to the file.
Sometimes listener might run into errors (of various origins). In this case, the listener silently dies, but all other processes continue running (rather surprisingly, worker errors causes all processes to terminate).
I would like to stop all processes (workers, listener, e.t.c.) when listener catches an error. How this can be done?
The scheme of my code is as follows:
def worker(file_path, q):
## do something
q.put(1.)
return True
def listener(q):
while True:
m = q.get()
if m == 'kill':
break
else:
try:
# do something and write to file
except Exception as err:
# raise error
tb = sys.exc_info()[2]
raise err.with_traceback(tb)
def main():
manager = mp.Manager()
q = manager.Queue(maxsize=3)
with mp.Pool(5) as pool:
watcher = pool.apply_async(listener, (q,))
files = ['path_1','path_2','path_3']
jobs = [ pool.apply_async(worker, (p,q,)) for p in files ]
# fire off workers
for job in jobs:
job.get()
# kill the listener when done
q.put('kill')
# run
if __name__ == "__main__":
main()
I tried introducing event = manager.Event() and using it as a flag in main():
## inside the pool, after starting workers
while True:
if event.is_set():
for job in jobs:
job.terminate()
No success. Calling os._exit(1) in listener exception block rises broken pipe error, but processes are not killed.
I also tried setting daemon = True,
for job in jobs:
job.daemon = True
Did not help.
In fact, to handle listener exceptions, I'm using a callable, as required by apply_async (so that they are not entirely silenced). This complicates the situation, but not much.
Thank you in advance.
As always there are many ways to accomplish what you're after, but I would probably suggest using an Event to signal that the processes should quit. I also would not use a Pool in this instance, as it only really simplifies things for simple cases where you need something like map. More complicated use cases quickly make it easier to just build you own "pool" with the functionality you need.
from multiprocessing import Process, Queue, Event
from random import random
def might_fail(a):
assert(a > .001)
def worker(args_q: Queue, result_q: Queue, do_quit: Event):
try:
while not do_quit.is_set():
args = args_q.get()
if args is None:
break
else:
# do something
result_q.put(random())
finally: #signal that worker is exiting even if exception is raised
result_q.put(None) #signal listener that worker is exiting
def listener(result_q: Queue, do_quit: Event, n_workers: int):
n_completed = 0
while n_workers > 0:
res = result_q.get()
if res is None:
n_workers -= 1
else:
n_completed += 1
try:
might_fail(res)
except:
do_quit.set() #let main continue
print(n_completed)
raise #reraise error after we signal others to stop
do_quit.set() #let main continue
print(n_completed)
if __name__ == "__main__":
args_q = Queue()
result_q = Queue()
do_quit = Event()
n_workers = 4
listener_p = Process(target=listener, args=(result_q, do_quit, n_workers))
listener_p.start()
for _ in range(n_workers):
worker_p = Process(target=worker, args=(args_q, result_q, do_quit))
worker_p.start()
for _ in range(1000):
args_q.put("some/file.txt")
for _ in range(n_workers):
args_q.put(None)
do_quit.wait()
print('done')
I have a request manager that builds a queue and starts x worker threads (x currently == 1).
Each thread is looping and getting elements from the queue appending the results to a shared list.
If the queue is exhausted the queue.Empty exception is caught, the current job marked as done and the thread should exit. This does work.
This block at the end of the run() however seems to break things. The queue has an arbitrary length and it might occur that the queue is longer then actual results fetchable. In order to exit all threads early a thread checks if the result he got has len == 0. If this is the case the thread clears the queue of all items left, marks itself as done and exits.
if len(request_result) == 0:
with self.q.mutex:
self.q.queue.clear()
self.q.task_done()
return
My assumption was that every thread would then finish it's current job and exit.
However the execution of the main thread hangs at q.join() and I can't debug why. From the debugger it looks like the worker-thread is not terminating. But that's just guessing.
I've read: Threading queue hangs in Python
but that does not solve the problem. I however set q.unfinished_tasks to 0 manually but that is not thread safe and will cause the program to crash when threads try to call taks_done() when another thread just set q.unfinished_tasks to 0.
class RequestManager:
def __init__(self, config=None):
self.config = config
def request_all_heroes(self):
q = queue.Queue()
result_list = []
# todo: get range max from highest hero ID.
for skip in [x * 100 for x in range(1, 3)]:
q.put_nowait(skip)
for _ in range(int(self.config["meta"]["number_of_threads"])):
RequestWorker(q=q,
config=self.config,
query_name='all_heroes',
shared_result_list=result_list).start()
q.join()
return [Hero(item) for sublist in result_list for item in sublist]
class RequestWorker(threading.Thread):
def __init__(self,
q=None,
config=None,
query_name="",
shared_result_list=None, *args, **kwargs):
self.q = q
self.config = config
self.query_file_path = self.config["files"][query_name]
self.shared_result_list = shared_result_list
super().__init__(*args, **kwargs)
def run(self):
keep_running = True
while keep_running:
try:
skip_number = self.q.get()
except queue.Empty:
self.q.task_done()
return
sr = SpecificRequest(config=self.config, skip=skip_number, query_file_path=self.query_file_path)
request_result = sr.do_specific_request()
if len(request_result) == 0:
with self.q.mutex:
self.q.queue.clear()
self.q.task_done()
return
self.shared_result_list.append(request_result)
self.q.task_done()
EDIT 1
if not self.q.empty():
skip_number = self.q.get()
else:
return
This works, unfortunately it is plain wrong becase get is called after the check if the queue is empty. This will cause problems at some point because a thread can check, see an element in the queue and another thread can snatch that last element in the meantime. Unlikely but possible.
This question is now about why self.q.get() does not return.
I have been struggling to implement a proper dynamic multi-thread system until now. The idea is to spin up multiple new pools of sub-threads from the main (each pool have its own number of threads and queue size) to run functions and the user can define if the main should wait for the sub-thread to finish up or just move to the next line after starting the thread. This multi-thread logic will help to extract data in parallel and at a fast frequency.
The solution to my issue is shared below for everyone who wants it. If you have any doubts and questions, please let me know.
# -*- coding: utf-8 -*-
"""
Created on Mon Jul 5 00:00:51 2021
#author: Tahasanul Abraham
"""
#%% Initialization of Libraries
import sys, os, inspect
currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
parentdir = os.path.dirname(currentdir)
sys.path.insert(0,parentdir)
parentdir_1up = os.path.dirname(parentdir)
sys.path.insert(0,parentdir_1up)
from queue import Queue
from threading import Thread, Lock
class Worker(Thread):
def __init__(self, tasks):
Thread.__init__(self)
self.tasks = tasks
self.daemon = True
self.lock = Lock()
self.start()
def run(self):
while True:
func, args, kargs = self.tasks.get()
try:
if func.lower() == "terminate":
break
except:
try:
with self.lock:
func(*args, **kargs)
except Exception as exception:
print(exception)
self.tasks.task_done()
class ThreadPool:
def __init__(self, num_threads, num_queue=None):
if num_queue is None or num_queue < num_threads:
num_queue = num_threads
self.tasks = Queue(num_queue)
self.threads = num_threads
for _ in range(num_threads): Worker(self.tasks)
# This function can be called to terminate all the worker threads of the queue
def terminate(self):
self.wait_completion()
for _ in range(self.threads): self.add_task("terminate")
return None
# This function can be called to add new work to the queue
def add_task(self, func, *args, **kargs):
self.tasks.put((func, args, kargs))
# This function can be called to wait till all the workers are done processing the pending works. If this function is called, the main will not process any new lines unless all the workers are done with the pending works.
def wait_completion(self):
self.tasks.join()
# This function can be called to check if there are any pending/running works in the queue. If there are any works pending, the call will return Boolean True or else it will return Boolean False
def is_alive(self):
if self.tasks.unfinished_tasks == 0:
return False
else:
return True
#%% Standalone Run
if __name__ == "__main__":
import time
def test_return(x,d):
print (str(x) + " - pool completed")
d[str(x)] = x
time.sleep(5)
# 2 thread and 10000000000 FIFO queues
pool = ThreadPool(2,1000000000)
r ={}
for i in range(10):
pool.add_task(test_return, i, r)
print (str(i) + " - pool added")
print ("Waiting for completion")
pool.wait_completion()
print ("pool done")
# 1 thread and 2 FIFO queues
pool = ThreadPool(1,2)
r ={}
for i in range(10):
pool.add_task(test_return, i, r)
print (str(i) + " - pool added")
print ("Waiting for completion")
pool.wait_completion()
print ("pool done")
# 2 thread and 1 FIFO queues
pool = ThreadPool(2,1)
r ={}
for i in range(10):
pool.add_task(test_return, i, r)
print (str(i) + " - pool added")
print ("Waiting for completion")
pool.wait_completion()
print ("pool done")
Making a new Pool
Using the above classes, one can make a pool of their own choise with the number of parallel threads they want and the size of the queue. Example of creating a pool of 10 threads with 200 queue size.
pool = ThreadPool(10,200)
Adding work to Pool
Once a pool is created, one can use that pool.add_task to do sub-routine works. In my example version i used the pool to call a function and its arguments. Example, I called the test_return fucntion with its arguments i and r.
pool.add_task(test_return, i, r)
Waiting for the pool to complete its work
If a pool is given some work to do, the user can either move to other code lines or wait for the pool to finish its work before the next lines ar being read. To wait for the pool to finish the work and then return back, a call for wait_completion is required. Example:
pool.wait_completion()
Terminate and close down the pool threads
Once the requirement of the pool threads are done, it is possible to terminate and close down the pool threads to save up memory and release the blocked threads. This can be done by calling the following function.
pool.terminate()
Checking if there are any pending works from the pool
There is a function that can be called to check if there are any pending/running works in the queue. If there are any works pending, the call will return Boolean True, or else it will return Boolean False. To check if the pool is working or not call the folling function.
pool.is_alive()
I am trying to create a pipeline but I have bad exit issues(zombies) and performance ones. I have created this generic class:
class Generator(Process):
'''
<function>: function to call. None value means that the current class will
be used as a template for another class, with <function> being defined
there
<input_queues> : Queue or list of Queue objects , which refer to the input
to <function>.
<output_queues> : Queue or list of Queue objects , which are used to pass
output
<sema_to_acquire> : Condition or list of Condition objects, which are
blocking generation while not notified
<sema_to_release> : Condition or list of Condition objects, which will be
notified after <function> is called
'''
def __init__(self, function=None, input_queues=None, output_queues=None, sema_to_acquire=None,
sema_to_release=None):
Process.__init__(self)
self.input_queues = input_queues
self.output_queues = output_queues
self.sema_to_acquire = sema_to_acquire
self.sema_to_release = sema_to_release
if function is not None:
self.function = function
def run(self):
if self.sema_to_release is not None:
try:
self.sema_to_release.release()
except AttributeError:
[sema.release() for sema in self.sema_to_release]
while True:
if self.sema_to_acquire is not None:
try:
self.sema_to_acquire.acquire()
except AttributeError:
[sema.acquire() for sema in self.sema_to_acquire]
if self.input_queues is not None:
try:
data = self.input_queues.get()
except AttributeError:
data = [queue.get() for queue in self.input_queues]
isiterable = True
try:
iter(data)
res = self.function(*tuple(data))
except TypeError, te:
res = self.function(data)
else:
res = self.function()
if self.output_queues is not None:
try:
if self.output_queues.full():
self.output_queues.get(res)
self.output_queues.put(res)
except AttributeError:
[queue.put(res) for queue in self.output_queues]
if self.sema_to_release is not None:
if self.sema_to_release is not None:
try:
self.sema_to_release.release()
except AttributeError:
[sema.release() for sema in self.sema_to_release]
to simulate a worker inside a pipeline. The Generator is wanted to run an infinite while loop, in which a function is executed using input from n queues and the result is written to m queues. There are some semaphores which need to be acquired by a process, before one iteration happens, and when the iteration finishes some other semaphores are released. So, for processes needed to run on parallel and produce an input for another I send 'crossed' semaphores as arguments, in order to force them to perform together single iterations. For processes which do not need to run on parallel I do not use any conditions. An example (which I actually use, if anyone ignores the input functions) is the following:
import time
from multiprocess import Lock
print_lock = Lock()
_t_=0.5
def func0(data):
time.sleep(_t_)
print_lock.acquire()
print 'func0 sends',data
print_lock.release()
return data
def func1(data):
time.sleep(_t_)
print_lock.acquire()
print 'func1 receives and sends',data
print_lock.release()
return data
def func2(data):
time.sleep(_t_)
print_lock.acquire()
print 'func2 receives and sends',data
print_lock.release()
return data
def func3(*data):
print_lock.acquire()
print 'func3 receives',data
print_lock.release()
run_svm = Semaphore()
run_rf = Semaphore()
inp_rf = Queue()
inp_svm = Queue()
out_rf = Queue()
out_svm = Queue()
kin_stream = Queue()
res_mixed = Queue()
streamproc = Generator(func0,
input_queues=kin_stream,
output_queues=[inp_rf,
inp_svm])
streamproc.daemon = True
streamproc.start()
svm_class = Generator(func1,
input_queues=inp_svm,
output_queues=out_svm,
sema_to_acquire=run_svm,
sema_to_release=run_rf)
svm_class.daemon=True
svm_class.start()
rf_class = Generator(func2,
input_queues=inp_rf,
output_queues=out_rf,
sema_to_acquire=run_rf,
sema_to_release=run_svm)
rf_class.daemon=True
rf_class.start()
mixed_class = Generator(func3,
input_queues=[out_rf, out_svm])
mixed_class.daemon = True
mixed_class.start()
count = 1
while True:
kin_stream.put([count])
count+=1
time.sleep(1)
streamproc.join()
svm_class.join()
rf_class.join()
mixed_class.join()
This example gives:
func0 sends 1
func2 receives and sends 1
func1 receives and sends 1
func3 receives (1, 1)
func0 sends 2
func2 receives and sends 2
func1 receives and sends 2
func3 receives (2, 2)
func0 sends 3
func2 receives and sends 3
func1 receives and sends 3
func3 receives (3, 3)
...
All good. However, if I try to kill main then the other subprocesses are not guaranteed to terminate: the terminal might freeze, or the python compiler might remain running on the background (probably zombies) and I have no clue why this is happening, as I have set the corresponding daemons to True.
Does anyone have a better idea of implementing this type of pipeline or can suggest a solution to this evil problem? Thank you all.
EDIT
Fixed testing. The zombies still do exist however.
I was able to overcome this problem, by introducing a termination queue as additional argument to the given class and set up a signal handler for SIGINT interrupt, in order to stop the pipeline execution. I do not know if this is the most elegant way to get it working, but it works. Also, the way the signal handler is set is important, as it must be set before process.start() for some reason, if anyone knows why, he can comment. Furthermore the signal handler is inherited by the subprocesses, so I have to put the join inside a try:..except AssertionError:pass pattern, otherwise it will throw error (again, if someone knows how to bypass this, please elaborate). Anyways, it works.
SOURCE CODE
class Generator(Process):
'''
<term_queue>: Queue to write termination events, must be same for all
processes spawned
<function>: function to call. None value means that the current class will
be used as a template for another class, with <function> being defined
there
<input_queues> : Queue or list of Queue objects , which refer to the input
to <function>.
<output_queues> : Queue or list of Queue objects , which are used to pass
output
<sema_to_acquire> : Semaphore or list of Semaphore objects, which are
blocking function execution
<sema_to_release> : Semaphore or list of Semaphore objects, which will be
released after <function> is called
'''
def __init__(self, term_queue,
function=None, input_queues=None, output_queues=None, sema_to_acquire=None,
sema_to_release=None):
Process.__init__(self)
self.term_queue = term_queue
self.input_queues = input_queues
self.output_queues = output_queues
self.sema_to_acquire = sema_to_acquire
self.sema_to_release = sema_to_release
if function is not None:
self.function = function
def run(self):
if self.sema_to_release is not None:
try:
self.sema_to_release.release()
except AttributeError:
deb = [sema.release() for sema in self.sema_to_release]
while True:
if not self.term_queue.empty():
self.term_queue.put((self.name, 0))
break
try:
if self.sema_to_acquire is not None:
try:
self.sema_to_acquire.acquire()
except AttributeError:
deb = [sema.acquire() for sema in self.sema_to_acquire]
if self.input_queues is not None:
try:
data = self.input_queues.get()
except AttributeError:
data = tuple([queue.get()
for queue in self.input_queues])
res = self.function(data)
else:
res = self.function()
if self.output_queues is not None:
try:
if self.output_queues.full():
self.output_queues.get(res)
self.output_queues.put(res)
except AttributeError:
deb = [queue.put(res) for queue in self.output_queues]
if self.sema_to_release is not None:
if self.sema_to_release is not None:
try:
self.sema_to_release.release()
except AttributeError:
deb = [sema.release() for sema in self.sema_to_release]
except Exception as exc:
self.term_queue.put((self.name, exc))
break
def signal_handler(sig, frame, term_queue, processes):
'''
<term_queue> is the queue to write termination of the __main__
<processes> is a dicitonary holding all running processes
'''
term_queue.put((__name__, 'SIGINT'))
try:
[processes[key].join() for key in processes]
except AssertionError:
pass
sys.exit(0)
term_queue = Queue()
'''
initialize some Generators and add them to <processes> dicitonary
'''
signal.signal(signal.SIGINT, lambda sig,frame: signal_handler(sig,frame,
term_queue,processes))
[processes[key].start() for key in processes]
while True:
if not term_queue.empty():
[processes[key].join() for key in processes]
break
and the example is changed accordingly (comment if you want me to add it)
I have had to work on this issue as well, and indeed, passing some communication pipe or queue to the processes seems to be the easiest way to tell them to terminate.
However the termination code can take advantage of a finally: bloc in the main process, it will take care of any event including signals.
If your processes are supposed to terminate at the same time as an object, you might also want to play with weakref.finalize, but it can be tricky.
This may have been asked in a similar context but I was unable to find an answer after about 20 minutes of searching, so I will ask.
I have written a Python script (lets say: scriptA.py) and a script (lets say scriptB.py)
In scriptB I want to call scriptA multiple times with different arguments, each time takes about an hour to run, (its a huge script, does lots of stuff.. don't worry about it) and I want to be able to run the scriptA with all the different arguments simultaneously, but I need to wait till ALL of them are done before continuing; my code:
import subprocess
#setup
do_setup()
#run scriptA
subprocess.call(scriptA + argumentsA)
subprocess.call(scriptA + argumentsB)
subprocess.call(scriptA + argumentsC)
#finish
do_finish()
I want to do run all the subprocess.call() at the same time, and then wait till they are all done, how should I do this?
I tried to use threading like the example here:
from threading import Thread
import subprocess
def call_script(args)
subprocess.call(args)
#run scriptA
t1 = Thread(target=call_script, args=(scriptA + argumentsA))
t2 = Thread(target=call_script, args=(scriptA + argumentsB))
t3 = Thread(target=call_script, args=(scriptA + argumentsC))
t1.start()
t2.start()
t3.start()
But I do not think this is right.
How do I know they have all finished running before going to my do_finish()?
Put the threads in a list and then use the Join method
threads = []
t = Thread(...)
threads.append(t)
...repeat as often as necessary...
# Start all threads
for x in threads:
x.start()
# Wait for all of them to finish
for x in threads:
x.join()
You need to use join method of Thread object in the end of the script.
t1 = Thread(target=call_script, args=(scriptA + argumentsA))
t2 = Thread(target=call_script, args=(scriptA + argumentsB))
t3 = Thread(target=call_script, args=(scriptA + argumentsC))
t1.start()
t2.start()
t3.start()
t1.join()
t2.join()
t3.join()
Thus the main thread will wait till t1, t2 and t3 finish execution.
In Python3, since Python 3.2 there is a new approach to reach the same result, that I personally prefer to the traditional thread creation/start/join, package concurrent.futures: https://docs.python.org/3/library/concurrent.futures.html
Using a ThreadPoolExecutor the code would be:
from concurrent.futures.thread import ThreadPoolExecutor
import time
def call_script(ordinal, arg):
print('Thread', ordinal, 'argument:', arg)
time.sleep(2)
print('Thread', ordinal, 'Finished')
args = ['argumentsA', 'argumentsB', 'argumentsC']
with ThreadPoolExecutor(max_workers=2) as executor:
ordinal = 1
for arg in args:
executor.submit(call_script, ordinal, arg)
ordinal += 1
print('All tasks has been finished')
The output of the previous code is something like:
Thread 1 argument: argumentsA
Thread 2 argument: argumentsB
Thread 1 Finished
Thread 2 Finished
Thread 3 argument: argumentsC
Thread 3 Finished
All tasks has been finished
One of the advantages is that you can control the throughput setting the max concurrent workers.
To use multiprocessing instead, you can use ProcessPoolExecutor.
I prefer using list comprehension based on an input list:
inputs = [scriptA + argumentsA, scriptA + argumentsB, ...]
threads = [Thread(target=call_script, args=(i)) for i in inputs]
[t.start() for t in threads]
[t.join() for t in threads]
You can have class something like below from which you can add 'n' number of functions or console_scripts you want to execute in parallel passion and start the execution and wait for all jobs to complete..
from multiprocessing import Process
class ProcessParallel(object):
"""
To Process the functions parallely
"""
def __init__(self, *jobs):
"""
"""
self.jobs = jobs
self.processes = []
def fork_processes(self):
"""
Creates the process objects for given function deligates
"""
for job in self.jobs:
proc = Process(target=job)
self.processes.append(proc)
def start_all(self):
"""
Starts the functions process all together.
"""
for proc in self.processes:
proc.start()
def join_all(self):
"""
Waits untill all the functions executed.
"""
for proc in self.processes:
proc.join()
def two_sum(a=2, b=2):
return a + b
def multiply(a=2, b=2):
return a * b
#How to run:
if __name__ == '__main__':
#note: two_sum, multiply can be replace with any python console scripts which
#you wanted to run parallel..
procs = ProcessParallel(two_sum, multiply)
#Add all the process in list
procs.fork_processes()
#starts process execution
procs.start_all()
#wait until all the process got executed
procs.join_all()
I just came across the same problem where I needed to wait for all the threads which were created using the for loop.I just tried out the following piece of code.It may not be the perfect solution but I thought it would be a simple solution to test:
for t in threading.enumerate():
try:
t.join()
except RuntimeError as err:
if 'cannot join current thread' in err:
continue
else:
raise
From the threading module documentation
There is a “main thread” object; this corresponds to the initial
thread of control in the Python program. It is not a daemon thread.
There is the possibility that “dummy thread objects” are created.
These are thread objects corresponding to “alien threads”, which are
threads of control started outside the threading module, such as
directly from C code. Dummy thread objects have limited functionality;
they are always considered alive and daemonic, and cannot be join()ed.
They are never deleted, since it is impossible to detect the
termination of alien threads.
So, to catch those two cases when you are not interested in keeping a list of the threads you create:
import threading as thrd
def alter_data(data, index):
data[index] *= 2
data = [0, 2, 6, 20]
for i, value in enumerate(data):
thrd.Thread(target=alter_data, args=[data, i]).start()
for thread in thrd.enumerate():
if thread.daemon:
continue
try:
thread.join()
except RuntimeError as err:
if 'cannot join current thread' in err.args[0]:
# catchs main thread
continue
else:
raise
Whereupon:
>>> print(data)
[0, 4, 12, 40]
Maybe, something like
for t in threading.enumerate():
if t.daemon:
t.join()
using only join can result in false-possitive interaction with thread. Like said in docs :
When the timeout argument is present and not None, it should be a
floating point number specifying a timeout for the operation in
seconds (or fractions thereof). As join() always returns None, you
must call isAlive() after join() to decide whether a timeout happened
– if the thread is still alive, the join() call timed out.
and illustrative piece of code:
threads = []
for name in some_data:
new = threading.Thread(
target=self.some_func,
args=(name,)
)
threads.append(new)
new.start()
over_threads = iter(threads)
curr_th = next(over_threads)
while True:
curr_th.join()
if curr_th.is_alive():
continue
try:
curr_th = next(over_threads)
except StopIteration:
break