I have a multithreaded function that I would like a status bar for using tqdm. Is there an easy way to show a status bar with ThreadPoolExecutor? It is the parallelization part that is confusing me.
import concurrent.futures
def f(x):
return f**2
my_iter = range(1000000)
def run(f,my_iter):
with concurrent.futures.ThreadPoolExecutor() as executor:
function = list(executor.map(f, my_iter))
return results
run(f, my_iter) # wrap tqdr around this function?
You can wrap tqdm around the executor as the following to track the progress:
list(tqdm(executor.map(f, iter), total=len(iter))
Here is your example:
import time
import concurrent.futures
from tqdm import tqdm
def f(x):
time.sleep(0.001) # to visualize the progress
return x**2
def run(f, my_iter):
with concurrent.futures.ThreadPoolExecutor() as executor:
results = list(tqdm(executor.map(f, my_iter), total=len(my_iter)))
return results
my_iter = range(100000)
run(f, my_iter)
And the result is like this:
16%|██▏ | 15707/100000 [00:00<00:02, 31312.54it/s]
The problem with the accepted answer is that the ThreadPoolExecutor.map function is obliged to generate results not in the order that they become available. So if the first invocation of myfunc happens to be, for example, the last one to complete, the progress bar will go from 0% to 100% all at once and only when all of the calls have completed. Much better would be to use ThreadPoolExecutor.submit with as_completed:
import time
import concurrent.futures
from tqdm import tqdm
def f(x):
time.sleep(0.001) # to visualize the progress
return x**2
def run(f, my_iter):
l = len(my_iter)
with tqdm(total=l) as pbar:
# let's give it some more threads:
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
futures = {executor.submit(f, arg): arg for arg in my_iter}
results = {}
for future in concurrent.futures.as_completed(futures):
arg = futures[future]
results[arg] = future.result()
pbar.update(1)
print(321, results[321])
my_iter = range(100000)
run(f, my_iter)
Prints:
321 103041
This is just the general idea. Depending upon the type of my_iter, it may not be possible to directly take apply the len function directly to it without first converting it into a list. The main point is to use submit with as_completed.
Most short way, i think:
with ThreadPoolExecutor(max_workers=20) as executor:
results = list(tqdm(executor.map(myfunc, range(len(my_array))), total=len(my_array)))
tried the example but progress bar fails still, and I find this post, seems useful in short way to use:
def tqdm_parallel_map(fn, *iterables):
""" use tqdm to show progress"""
executor = concurrent.futures.ProcessPoolExecutor()
futures_list = []
for iterable in iterables:
futures_list += [executor.submit(fn, i) for i in iterable]
for f in tqdm(concurrent.futures.as_completed(futures_list), total=len(futures_list)):
yield f.result()
def multi_cpu_dispatcher_process_tqdm(data_list, single_job_fn):
""" multi cpu dispatcher """
output = []
for result in tqdm_parallel_map(single_job_fn, data_list):
output += result
return output
I find more intuitive to use the update() method of tqdm, we keep an human readable structure:
with tqdm(total=len(mylist)) as progress:
with ThreadPoolExecutor() as executor:
for __ in executor.map(fun, mylist):
progress.update() # We update the progress bar each time that a job finish
Since I don't care about the output of fun I use __ as throwaway variable.
Related
My TQDM progress bar doesn't show during my multithreaded process, I only see it after the process is finished
Here is a way to reproduce the problem
I coded these two methods
from concurrent.futures import ProcessPoolExecutor
import sys
from colorama import Fore
def parallelize(desc, func, array, max_workers):
with ProcessPoolExecutor(max_workers=max_workers) as executor:
output_data = list(progress_bar(desc, list(executor.map(func,array))))
return output_data
def progress_bar(desc, array):
return tqdm(array,
total=len(array),
file=sys.stdout,
ascii=' >',
desc=desc,
bar_format="%s{l_bar}%s{bar:30}%s{r_bar}" % (Fore.RESET, Fore.BLUE, Fore.RESET))
you can test it this way
from tqdm import tqdm
test = range(int(1e4))
def identity(x):
return x
parallelize("", identity, test, 2)
It should print this (00:00) but the process takes around 3sc
100%|>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>| 10000/10000 [00:00<00:00, 3954279.25it/s]
Thanks for the help
I think this is cause when you call your progress bar
output_data = list(progress_bar(desc, list(executor.map(func,array))))
python first executor.map(func, array) and only then pass the results to progress_bar. It won't be the same but I can share with you a boiler plate of how to parallelize a python function.
from joblib import Parallel, delayed
def func(a):
# Do something
# Parallelize the call
Parallel(n_jobs=-1)(delayed(func)(a) for a in tqdm(array, total=len(array))
replaced this method and it worked
def parallelize(desc, func, array, max_workers):
return Parallel(n_jobs=max_workers)(delayed(func)(a) for a in progress_bar(desc, array))
I doing 100 iterations of the function model so, i tried using multiprocessing to distribute the tasks and for getting the final output I tried using queue but it takes too much time, failing the purpose of multiprocessing. How to solve this problem?
def model(X,Y):
ada_clf={}
pred1={}
auc_final=[]
for iteration in range(100):
ada_clf[iteration] = AdaBoostClassifier(DecisionTreeClassifier(),n_estimators=1000,learning_rate=0.001)
ada_clf[iteration].fit(X,Y)
pred1[iteration]=ada_clf[iteration].predict(test1)
individuallabelsfromada1=[]
for i in range(len(test1)):
individuallabelsfromada1.append([])
for j in range(100):
individuallabelsfromada1[i].append(pred1[j][i])
final_labels_ada1=[]
for each in individuallabelsfromada1:
final_labels_ada1.append(find_majority(each))
final=pd.Series(final_labels_ada1)
temp_arr=np.array(final)
total_labels2=pd.Series(temp_arr)
fpr, tpr, thresholds = roc_curve(y_test, total_labels2, pos_label=1)
auc_final.append(auc(fpr,tpr))
q.put(total_labels2)
q1.put(auc_final)
q2.put(ada_clf)
print('done')
overall_labels={}
final_auc={}
final_ada_clf={}
processes=[]
q=Queue()
q1=Queue()
q2=Queue()
for iteration in range(100):
if __name__=='__main__':
p=multiprocessing.Process(target=model,args=(x_train,y_labels,q,q1,q2,))
overall_labels[iteration]=q.get()
final_auc[iteration]=q1.get()
final_ada_clf[iteration]=q2.get()
p.start()
processes.append(p)
for each in processes:
each.join()
Below is my edited version, but returns only single output, i tried using multiple output but could not get it, so settled for only single output i.e. total_labels2:-
##code before this is same as before, only thing changed is arguments of model from def model(X,Y) to def model(repeat,X,Y)
total_labels2 = pd.Series(temp_arr)
return (repeat,total_labels2)
def get_result(total_labels2):
global testover_forall
testover_forall.append(total_labels2)
if __name__ == '__main__':
import multiprocessing as mp
testover_forall = []
pool = mp.Pool(40)
for repeat in range(100):
pool.apply_async(bound_model, args= repeat, x_train, y_train), callback= get_result)
pool.close()
pool.join()
repetations_index=[]
for i in range(100):
repetations_index.append(testover_forall[i][0])
final_last_labels = {}
for i in range(100):
temp = str(i)
final_last_labels[temp] = testover_forall[repetations_index[i]][1]
totally_last_labels=[]
for each in final_last_labels:
temp=np.array(final_last_labels[each])
totally_last_labels.append(temp)
See my comments (actually questions) to your post.
You should be using a multiprocessing pool to limit the number of processes that you create to the number of CPU cores that you have. This will also make it easier to get return values back from your model function instead of writing results to 3 different queues (and you could have written a tuple of 3 values to just one queue). You will, of course, require other import statements and code. Given your use of numpy and other libraries, which may be implemented in C Language, you could also retry running this using threading to see if that helps or hurts performance. Do this by replacing ProcessPoolExecutor with ThreadPoolExecutor in the two places it is referenced.
Note
Any changes that model makes to passed arguments X and Y will not be reflected back to the main process. So if model is called repeatedly with the same arguments over and over, as it appears to be, it's not clear whether each call will return different values, especially if the calls are being done in parallel.
from concurrent.futures import ProcessPoolExecutor
def model(X,Y):
ada_clf={}
pred1={}
auc_final=[]
for iteration in range(100):
ada_clf[iteration] = AdaBoostClassifier(DecisionTreeClassifier(),n_estimators=1000,learning_rate=0.001)
ada_clf[iteration].fit(X,Y)
pred1[iteration]=ada_clf[iteration].predict(test1)
individuallabelsfromada1=[]
for i in range(len(test1)):
individuallabelsfromada1.append([])
for j in range(100):
individuallabelsfromada1[i].append(pred1[j][i])
final_labels_ada1=[]
for each in individuallabelsfromada1:
final_labels_ada1.append(find_majority(each))
final=pd.Series(final_labels_ada1)
temp_arr=np.array(final)
total_labels2=pd.Series(temp_arr)
fpr, tpr, thresholds = roc_curve(y_test, total_labels2, pos_label=1)
auc_final.append(auc(fpr,tpr))
#q.put(total_labels2)
#q1.put(auc_final)
#q2.put(ada_clf)
return total_labels2, auc_final, ada_clf
#print('done')
if __name__ == '__main__':
with ProcessPoolExecutor() as executor:
futures = [executor.submit(model, x_train, y_labels) for iteration in range(100)]
# simple lists will suffice:
overall_labels = []
final_auc = []
final_ada_clf = []
for future in futures:
# get return value and store
total_labels2, auc_final, ada_clf = future.result()
overall_labels.append(total_labels2)
final_auc.append(auc_final)
final_ada_clf.append(ada_clf)
Update
It wasn't clear from the problem specification that the returned results are based on a random number generator and if successive calls to the worker function, model, do not employ a single random number generator across all processes in the multiprocessing pool, then the multiprocessing implementation will clearly return different results than the non-multiprocessing version. And it is not clear from the code provided where the random number generator is being used; it may be in library code that you have no access to. If that is the case, you have two options: (1) Use multithreading instead by changing the import statement as I have indicated in the code below; you may still achieve performance benefits as I have already mentioned or (2) Update the signature to model as follows. You will be passed a new argument, random_generator, that currently supports two methods, randint (like random.randint and random (like random.random), although it should be easy enough to modify the code if you need a different method from module random. You will use this random number generator in place of module random if you are able to. But note that this random generator will run much more slowly than the standard one; this is the price you pay.
Since we are also adding a repetition argument to model (it now has to be the final argument -- note the updated signature below), we can now use method map (no need to use a callback):
def model(X,Y, random_generator, repetition):
...
etc.
from multiprocessing import Pool
# or use the following import instead to use multithreading (but then use standard random generator):
# from multiprocessing.dummy import Pool
import random
from functools import partial
from multiprocessing.managers import BaseManager
class RandomGeneratorManager(BaseManager):
pass
class RandomGenerator:
def __init__(self):
random.seed(0)
def randint(self, a, b):
return random.randint(a, b)
def random(self):
return random.random()
# add other functions if needed
if __name__ == '__main__':
RandomGeneratorManager.register('RandomGenerator', RandomGenerator)
with RandomGeneratorManager() as manager:
random_generator = manager.RandomGenerator()
# why 40? why not use default, which is the number of cpu cores you have?:
pool = Pool(40):
worker = partial(model, x_train, y_labels, random_generator)
results = pool.map(worker, range(100))
I'm trying to add a progression bar to my program, however, solutions that seems to works for other (on other posts) do not work for me.
Python version 3.6.
import multiprocessing as mp
import tqdm
def f(dynamic, fix1, fix2):
return dynamic + fix1 + fix2
N = 2
fix1 = 5
fix2= 10
dynamic = range(10)
p = mp.Pool(processes = N)
for _ in tqdm.tqdm(p.starmap(f, [(d, fix1, fix2) for d in dynamic]), total = len(dynamic)):
pass
p.close()
p.join()
Any idea why the multiprocessing works (the computation is done), but there is no progress bar?
NB: The example above is dummy, my function are different.
Other question: how can I interrupt properly a multiprocessing program? The ctrl+C that I usually do in signle thread seems to pose some issues.
Unfortunately, tqdm is not working with starmap. You can use the following:
def f(args):
arg1, arg2 = args
... do something with arg1, arg2 ...
for _ in tqdm.tqdm(pool.imap_unordered(f, zip(list_of_args, list_of_args2)), total=total):
pass
I am struggling for a while with Multiprocessing in Python. I would like to run 2 independent functions simultaneously, wait until both calculations are finished and then continue with the output of both functions. Something like this:
# Function A:
def jobA(num):
result=num*2
return result
# Fuction B:
def jobB(num):
result=num^3
return result
# Parallel process function:
{resultA,resultB}=runInParallel(jobA(num),jobB(num))
I found other examples of multiprocessing however they used only one function or didn't returned an output. Anyone knows how to do this? Many thanks!
I'd recommend creating processes manually (rather than as part of a pool), and sending the return values to the main process through a multiprocessing.Queue. These queues can share almost any Python object in a safe and relatively efficient way.
Here's an example, using the jobs you've posted.
def jobA(num, q):
q.put(num * 2)
def jobB(num, q):
q.put(num ^ 3)
import multiprocessing as mp
q = mp.Queue()
jobs = (jobA, jobB)
args = ((10, q), (2, q))
for job, arg in zip(jobs, args):
mp.Process(target=job, args=arg).start()
for i in range(len(jobs)):
print('Result of job {} is: {}'.format(i, q.get()))
This prints out:
Result of job 0 is: 20
Result of job 1 is: 1
But you can of course do whatever further processing you'd like using these values.
I am trying to chain several tasks together in iPyParallel, like
import ipyparallel
client = ipyparallel.Client()
view = client.load_balanced_view()
def task1(x):
## Do some work.
return x * 2
def task2(x):
## Do some work.
return x * 3
def task3(x):
## Do some work.
return x * 4
results1 = view.map_async(task1, [1, 2, 3])
results2 = view.map_async(task2, results1.get())
results3 = view.map_async(task3, results2.get())
However, this code won't submit any task2 unless task1 is done and is essentially blocking. My tasks can take different time and it is very inefficient. Is there an easy way that I can chain these steps efficiently and engines can get the results from previous steps? Something like:
def task2(x):
## Do some work.
return x.get() * 3 ## Get AsyncResult out.
def task3(x):
## Do some work.
return x.get() * 4 ## Get AsyncResult out.
results1 = [view.apply_async(task1, x) for x in [1, 2, 3]]
results2 = []
for x in result1:
view.set_flags(after=x.msg_ids)
results2.append(view.apply_async(task2, x))
results3 = []
for x in result2:
view.set_flags(after=x.msg_ids)
results3.append(view.apply_async(task3, x))
Apparently, this will fail as AsyncResult is not pickable.
I was considering a few solutions:
Use view.map_async(ordered=False).
results1 = view.map_async(task1, [1, 2, 3], ordered=False)
for x in results1:
results2.append(view.apply_async(task2, x.get()))
But this has to wait for all task1 to finish before any task3 can be submitted. It is still blocking.
Use asyncio.
#asyncio.coroutine
def submitter(x):
result1 = yield from asyncio.wrap_future(view.apply_async(task1, x))
result2 = yield from asyncio.wrap_future(view.apply_async(task2, result1)
result3 = yield from asyncio.wrap_future(view.apply_async(task3, result2)
yield result3
#asyncio.coroutine
def submit_all(ls):
jobs = [submitter(x) for x in ls]
results = []
for async_r in asyncio.as_completed(jobs):
r = yield from async_r
results.append(r)
## Do some work, like analysing results.
It is working, but the code soon become messy and unintuitive when more complicated tasks are introduced.
Thank you for your help.
Option 1: chain futures
IPython parallel isn't the best at doing this because the connection has to be done at the client level. You do have to wait for the results to complete and return to the client before submitting the results. Essentially, your asyncio submit_all is the right way to do it for IPython parallel. You can get something a little more generic by writing a chain function that uses add_done_callback to submit the new task when the previous one completes:
from concurrent.futures import Future
from functools import partial
def chain_apply(view, func, future):
"""Chain a call to view.apply(func, future.result()) when future is ready.
Returns a Future for the subsequent result.
"""
f2 = Future()
# when f1 is ready, submit a new task for func on its result
def apply_func(f):
if f.exception():
f2.set_exception(f.exception())
return
print('submitting %s(%s)' % (func.__name__, f.result()))
ar = view.apply_async(func, f.result())
# when ar is done, pass through the result to f2
ar.add_done_callback(lambda ar: f2.set_result(ar.get()))
future.add_done_callback(apply_func)
return f2
def chain_map(view, func, list_of_futures):
"""Chain a new callback on a list of futures."""
return [ chain_apply(view, func, f) for f in list_of_futures ]
# use builtin map with apply, since we want one Future per item
results1 = map(partial(view.apply, task1), [1, 2, 3])
results2 = chain_map(view, task2, results1)
results3 = chain_map(view, task3, results2)
print("Waiting for results")
[ r.result() for r in results3 ]
As with any example of add_done_callback, it can be written with coroutines, but I find the callbacks in this case to be fine. This should at least be a fairly generic utility that you can use to compose your pipeline.
Option 2: dask.distributed
Full disclosure: I'm the primary author of IPython Parallel, about to suggest that you use a different tool.
It is possible to pass results from one task to another via engine namespaces and DAG dependencies in IPython parallel, but honestly, if your workflow looks like this, you should consider using dask distributed, which is designed specifically for this kind of computation graph. If you are already comfortable and familiar with IPython parallel, getting started with dask should not be too much of a burden.
IPython 5.1 provides a handy command for turning your IPython parallel cluster into a dask distributed cluster:
import ipyparallel as ipp
client = ipp.Client()
executor = client.become_distributed(ncores=1)
And then the key relevant feature of dask is that you can submit futures as arguments to subsequent map calls, and the scheduler takes care of it when the results are ready, rather than having to do it explicitly in the client:
results1 = executor.map(task1, [1, 2, 3])
results2 = executor.map(task2, results1)
results3 = executor.map(task3, results2)
executor.gather(results3)
So basically, dask distributed works how you wish IPython parallel's load-balancing would work when you need to chain things like this.
This notebook illustrates both examples.