Tqdm progress bar only shows after process end with ProcessPoolExecutor

Tqdm progress bar only shows after process end with ProcessPoolExecutor - python

My TQDM progress bar doesn't show during my multithreaded process, I only see it after the process is finished
Here is a way to reproduce the problem
I coded these two methods
from concurrent.futures import ProcessPoolExecutor
import sys
from colorama import Fore
def parallelize(desc, func, array, max_workers):
with ProcessPoolExecutor(max_workers=max_workers) as executor:
output_data = list(progress_bar(desc, list(executor.map(func,array))))
return output_data
def progress_bar(desc, array):
return tqdm(array,
total=len(array),
file=sys.stdout,
ascii=' >',
desc=desc,
bar_format="%s{l_bar}%s{bar:30}%s{r_bar}" % (Fore.RESET, Fore.BLUE, Fore.RESET))
you can test it this way
from tqdm import tqdm
test = range(int(1e4))
def identity(x):
return x
parallelize("", identity, test, 2)
It should print this (00:00) but the process takes around 3sc
100%|>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>| 10000/10000 [00:00<00:00, 3954279.25it/s]
Thanks for the help

I think this is cause when you call your progress bar
output_data = list(progress_bar(desc, list(executor.map(func,array))))
python first executor.map(func, array) and only then pass the results to progress_bar. It won't be the same but I can share with you a boiler plate of how to parallelize a python function.
from joblib import Parallel, delayed
def func(a):
# Do something
# Parallelize the call
Parallel(n_jobs=-1)(delayed(func)(a) for a in tqdm(array, total=len(array))

replaced this method and it worked
def parallelize(desc, func, array, max_workers):
return Parallel(n_jobs=max_workers)(delayed(func)(a) for a in progress_bar(desc, array))

Related

Multiple iterations of function with multiple arguments returning multiple values using Multiprocessing in python

I doing 100 iterations of the function model so, i tried using multiprocessing to distribute the tasks and for getting the final output I tried using queue but it takes too much time, failing the purpose of multiprocessing. How to solve this problem?
def model(X,Y):
ada_clf={}
pred1={}
auc_final=[]
for iteration in range(100):
ada_clf[iteration] = AdaBoostClassifier(DecisionTreeClassifier(),n_estimators=1000,learning_rate=0.001)
ada_clf[iteration].fit(X,Y)
pred1[iteration]=ada_clf[iteration].predict(test1)
individuallabelsfromada1=[]
for i in range(len(test1)):
individuallabelsfromada1.append([])
for j in range(100):
individuallabelsfromada1[i].append(pred1[j][i])
final_labels_ada1=[]
for each in individuallabelsfromada1:
final_labels_ada1.append(find_majority(each))
final=pd.Series(final_labels_ada1)
temp_arr=np.array(final)
total_labels2=pd.Series(temp_arr)
fpr, tpr, thresholds = roc_curve(y_test, total_labels2, pos_label=1)
auc_final.append(auc(fpr,tpr))
q.put(total_labels2)
q1.put(auc_final)
q2.put(ada_clf)
print('done')
overall_labels={}
final_auc={}
final_ada_clf={}
processes=[]
q=Queue()
q1=Queue()
q2=Queue()
for iteration in range(100):
if __name__=='__main__':
p=multiprocessing.Process(target=model,args=(x_train,y_labels,q,q1,q2,))
overall_labels[iteration]=q.get()
final_auc[iteration]=q1.get()
final_ada_clf[iteration]=q2.get()
p.start()
processes.append(p)
for each in processes:
each.join()
Below is my edited version, but returns only single output, i tried using multiple output but could not get it, so settled for only single output i.e. total_labels2:-
##code before this is same as before, only thing changed is arguments of model from def model(X,Y) to def model(repeat,X,Y)
total_labels2 = pd.Series(temp_arr)
return (repeat,total_labels2)
def get_result(total_labels2):
global testover_forall
testover_forall.append(total_labels2)
if __name__ == '__main__':
import multiprocessing as mp
testover_forall = []
pool = mp.Pool(40)
for repeat in range(100):
pool.apply_async(bound_model, args= repeat, x_train, y_train), callback= get_result)
pool.close()
pool.join()
repetations_index=[]
for i in range(100):
repetations_index.append(testover_forall[i][0])
final_last_labels = {}
for i in range(100):
temp = str(i)
final_last_labels[temp] = testover_forall[repetations_index[i]][1]
totally_last_labels=[]
for each in final_last_labels:
temp=np.array(final_last_labels[each])
totally_last_labels.append(temp)

See my comments (actually questions) to your post.
You should be using a multiprocessing pool to limit the number of processes that you create to the number of CPU cores that you have. This will also make it easier to get return values back from your model function instead of writing results to 3 different queues (and you could have written a tuple of 3 values to just one queue). You will, of course, require other import statements and code. Given your use of numpy and other libraries, which may be implemented in C Language, you could also retry running this using threading to see if that helps or hurts performance. Do this by replacing ProcessPoolExecutor with ThreadPoolExecutor in the two places it is referenced.
Note
Any changes that model makes to passed arguments X and Y will not be reflected back to the main process. So if model is called repeatedly with the same arguments over and over, as it appears to be, it's not clear whether each call will return different values, especially if the calls are being done in parallel.
from concurrent.futures import ProcessPoolExecutor
def model(X,Y):
ada_clf={}
pred1={}
auc_final=[]
for iteration in range(100):
ada_clf[iteration] = AdaBoostClassifier(DecisionTreeClassifier(),n_estimators=1000,learning_rate=0.001)
ada_clf[iteration].fit(X,Y)
pred1[iteration]=ada_clf[iteration].predict(test1)
individuallabelsfromada1=[]
for i in range(len(test1)):
individuallabelsfromada1.append([])
for j in range(100):
individuallabelsfromada1[i].append(pred1[j][i])
final_labels_ada1=[]
for each in individuallabelsfromada1:
final_labels_ada1.append(find_majority(each))
final=pd.Series(final_labels_ada1)
temp_arr=np.array(final)
total_labels2=pd.Series(temp_arr)
fpr, tpr, thresholds = roc_curve(y_test, total_labels2, pos_label=1)
auc_final.append(auc(fpr,tpr))
#q.put(total_labels2)
#q1.put(auc_final)
#q2.put(ada_clf)
return total_labels2, auc_final, ada_clf
#print('done')
if __name__ == '__main__':
with ProcessPoolExecutor() as executor:
futures = [executor.submit(model, x_train, y_labels) for iteration in range(100)]
# simple lists will suffice:
overall_labels = []
final_auc = []
final_ada_clf = []
for future in futures:
# get return value and store
total_labels2, auc_final, ada_clf = future.result()
overall_labels.append(total_labels2)
final_auc.append(auc_final)
final_ada_clf.append(ada_clf)
Update
It wasn't clear from the problem specification that the returned results are based on a random number generator and if successive calls to the worker function, model, do not employ a single random number generator across all processes in the multiprocessing pool, then the multiprocessing implementation will clearly return different results than the non-multiprocessing version. And it is not clear from the code provided where the random number generator is being used; it may be in library code that you have no access to. If that is the case, you have two options: (1) Use multithreading instead by changing the import statement as I have indicated in the code below; you may still achieve performance benefits as I have already mentioned or (2) Update the signature to model as follows. You will be passed a new argument, random_generator, that currently supports two methods, randint (like random.randint and random (like random.random), although it should be easy enough to modify the code if you need a different method from module random. You will use this random number generator in place of module random if you are able to. But note that this random generator will run much more slowly than the standard one; this is the price you pay.
Since we are also adding a repetition argument to model (it now has to be the final argument -- note the updated signature below), we can now use method map (no need to use a callback):
def model(X,Y, random_generator, repetition):
...
etc.
from multiprocessing import Pool
# or use the following import instead to use multithreading (but then use standard random generator):
# from multiprocessing.dummy import Pool
import random
from functools import partial
from multiprocessing.managers import BaseManager
class RandomGeneratorManager(BaseManager):
pass
class RandomGenerator:
def __init__(self):
random.seed(0)
def randint(self, a, b):
return random.randint(a, b)
def random(self):
return random.random()
# add other functions if needed
if __name__ == '__main__':
RandomGeneratorManager.register('RandomGenerator', RandomGenerator)
with RandomGeneratorManager() as manager:
random_generator = manager.RandomGenerator()
# why 40? why not use default, which is the number of cpu cores you have?:
pool = Pool(40):
worker = partial(model, x_train, y_labels, random_generator)
results = pool.map(worker, range(100))

Tqdm for a progress bar per process pool worker

The task I am trying to achieve is to process thousands of artifacts of different sizes on a multi core machine. I wish to use the process pool executor to distribute the jobs and have each worker tell me which file it is working on.
So far, I have the following:
from concurrent.futures import ProcessPoolExecutor
from itertools import islice, cycle
import time
import tqdm
import multiprocessing
import random
worker_count = min(multiprocessing.cpu_count(), 10)
flist=range(100)
executor = ProcessPoolExecutor(max_workers=worker_count)
with tqdm.tqdm(total=len(flist), leave=False) as t:
t.set_description_str("Extracting ... ")
pbars = []
for idx in range(t.pos + 1, t.pos + 1 + worker_count):
pbars.append(tqdm.tqdm(position=idx, bar_format='{desc}', leave=False))
def process(entry):
artifact, idx = entry
time.sleep(random.randint(0, worker_count)/10.0)
pbars[idx].set_description_str(f'Working on {artifact}', refresh=True)
return artifact
for _, _ in zip(flist, executor.map(process, zip(flist, islice(cycle(range(worker_count)), len(flist))))):
t.update()
for idx in range(worker_count):
pbars[idx].set_description_str(" "*(pbars[idx].ncols - 1), refresh=True)
pbars[idx].clear()
pbars[idx].close()
Of course, instead of the numbers, I will be displaying the file names.
Now, the questions are:
Is there a better pythonic way to achieve what I want?
The last bit about clearing pbars seems obnoxious to me. I do that basically to clear up the terminal when the program finishes. Perhaps there is a better way?

taking many random samples in parallel python

I'm trying to repeatedly run a function that requires a few positional arguments and involves random number generation (to generate many samples of a distribution). For a MWE, I think this captures everything:
import numpy as np
import multiprocessing as mup
from functools import partial
def rarr(xsize, ysize, k):
return np.random.rand(xsize, ysize)
def clever_array(nsamp, xsize=100, ysize=100, ncores=None):
np.random.seed()
if ncores is None:
p = mup.Pool()
else:
p = mup.Pool(ncores)
out = p.map_async( partial(rarr, xsize, ysize), range(nsamp))
p.close()
return np.array(out.get())
Note that the final positional argument for rarr() is just a dummy variable, since I am using map_async(), which requires an iterable. Now if I run %timeit clever_array(500, ncores = 1) I get 208 ms, whereas %timeit clever_array(500, ncores = 5) takes 149 ms. So there is definitely some kind of parallelism happening (the speedup isn't terribly impressive for this MWE but is decent in my real code).
However, I'm wondering a few things -- is there a more natural implementation other than the dummy variable for rarr() passed as an iterable to map_async to run this many times? Is there any obvious way to pass the xsize and ysize args to rarr() other than partial()? And is there any way to ensure different results from the different cores other than initializing a different random.seed() every time?
Thanks for any help!

Typically when we use multiprocessing we would expect different results from each invocation of a function, therefore it doesn't quite make sense to call the same function many times. In order to ensure the randomness of the sampling output, it is best to separate the random state (seed) from the function itself. The best approach as recommended by the numpy official doc is to use a np.random.Generator object, created via np.random.default_rng([seed]). With that we can modify your code to
import numpy as np
import multiprocessing as mup
from functools import partial
def rarr(xsize, ysize, rng):
return rng.random((xsize, ysize))
def clever_array(nsamp, xsize=100, ysize=100, ncores=None):
if ncores is None:
p = mup.Pool()
else:
p = mup.Pool(ncores)
out = p.map_async(partial(rarr, xsize, ysize), map(np.random.default_rng, range(nsamp)))
p.close()
return np.array(out.get())

Use tqdm with concurrent.futures?

I have a multithreaded function that I would like a status bar for using tqdm. Is there an easy way to show a status bar with ThreadPoolExecutor? It is the parallelization part that is confusing me.
import concurrent.futures
def f(x):
return f**2
my_iter = range(1000000)
def run(f,my_iter):
with concurrent.futures.ThreadPoolExecutor() as executor:
function = list(executor.map(f, my_iter))
return results
run(f, my_iter) # wrap tqdr around this function?

You can wrap tqdm around the executor as the following to track the progress:
list(tqdm(executor.map(f, iter), total=len(iter))
Here is your example:
import time
import concurrent.futures
from tqdm import tqdm
def f(x):
time.sleep(0.001) # to visualize the progress
return x**2
def run(f, my_iter):
with concurrent.futures.ThreadPoolExecutor() as executor:
results = list(tqdm(executor.map(f, my_iter), total=len(my_iter)))
return results
my_iter = range(100000)
run(f, my_iter)
And the result is like this:
16%|██▏ | 15707/100000 [00:00<00:02, 31312.54it/s]

The problem with the accepted answer is that the ThreadPoolExecutor.map function is obliged to generate results not in the order that they become available. So if the first invocation of myfunc happens to be, for example, the last one to complete, the progress bar will go from 0% to 100% all at once and only when all of the calls have completed. Much better would be to use ThreadPoolExecutor.submit with as_completed:
import time
import concurrent.futures
from tqdm import tqdm
def f(x):
time.sleep(0.001) # to visualize the progress
return x**2
def run(f, my_iter):
l = len(my_iter)
with tqdm(total=l) as pbar:
# let's give it some more threads:
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
futures = {executor.submit(f, arg): arg for arg in my_iter}
results = {}
for future in concurrent.futures.as_completed(futures):
arg = futures[future]
results[arg] = future.result()
pbar.update(1)
print(321, results[321])
my_iter = range(100000)
run(f, my_iter)
Prints:
321 103041
This is just the general idea. Depending upon the type of my_iter, it may not be possible to directly take apply the len function directly to it without first converting it into a list. The main point is to use submit with as_completed.

Most short way, i think:
with ThreadPoolExecutor(max_workers=20) as executor:
results = list(tqdm(executor.map(myfunc, range(len(my_array))), total=len(my_array)))

tried the example but progress bar fails still, and I find this post, seems useful in short way to use:
def tqdm_parallel_map(fn, *iterables):
""" use tqdm to show progress"""
executor = concurrent.futures.ProcessPoolExecutor()
futures_list = []
for iterable in iterables:
futures_list += [executor.submit(fn, i) for i in iterable]
for f in tqdm(concurrent.futures.as_completed(futures_list), total=len(futures_list)):
yield f.result()
def multi_cpu_dispatcher_process_tqdm(data_list, single_job_fn):
""" multi cpu dispatcher """
output = []
for result in tqdm_parallel_map(single_job_fn, data_list):
output += result
return output

I find more intuitive to use the update() method of tqdm, we keep an human readable structure:
with tqdm(total=len(mylist)) as progress:
with ThreadPoolExecutor() as executor:
for __ in executor.map(fun, mylist):
progress.update() # We update the progress bar each time that a job finish
Since I don't care about the output of fun I use __ as throwaway variable.

tqdm progress bar and multiprocessing

I'm trying to add a progression bar to my program, however, solutions that seems to works for other (on other posts) do not work for me.
Python version 3.6.
import multiprocessing as mp
import tqdm
def f(dynamic, fix1, fix2):
return dynamic + fix1 + fix2
N = 2
fix1 = 5
fix2= 10
dynamic = range(10)
p = mp.Pool(processes = N)
for _ in tqdm.tqdm(p.starmap(f, [(d, fix1, fix2) for d in dynamic]), total = len(dynamic)):
pass
p.close()
p.join()
Any idea why the multiprocessing works (the computation is done), but there is no progress bar?
NB: The example above is dummy, my function are different.
Other question: how can I interrupt properly a multiprocessing program? The ctrl+C that I usually do in signle thread seems to pose some issues.

Unfortunately, tqdm is not working with starmap. You can use the following:
def f(args):
arg1, arg2 = args
... do something with arg1, arg2 ...
for _ in tqdm.tqdm(pool.imap_unordered(f, zip(list_of_args, list_of_args2)), total=total):
pass

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tqdm progress bar only shows after process end with ProcessPoolExecutor - python

replaced this method and it worked def parallelize(desc, func, array, max_workers): return Parallel(n_jobs=max_workers)(delayed(func)(a) for a in progress_bar(desc, array))

Related

Multiple iterations of function with multiple arguments returning multiple values using Multiprocessing in python

Tqdm for a progress bar per process pool worker

taking many random samples in parallel python

Use tqdm with concurrent.futures?

tqdm progress bar and multiprocessing

Categories

Resources