I'm trying to add a progression bar to my program, however, solutions that seems to works for other (on other posts) do not work for me.
Python version 3.6.
import multiprocessing as mp
import tqdm
def f(dynamic, fix1, fix2):
return dynamic + fix1 + fix2
N = 2
fix1 = 5
fix2= 10
dynamic = range(10)
p = mp.Pool(processes = N)
for _ in tqdm.tqdm(p.starmap(f, [(d, fix1, fix2) for d in dynamic]), total = len(dynamic)):
pass
p.close()
p.join()
Any idea why the multiprocessing works (the computation is done), but there is no progress bar?
NB: The example above is dummy, my function are different.
Other question: how can I interrupt properly a multiprocessing program? The ctrl+C that I usually do in signle thread seems to pose some issues.
Unfortunately, tqdm is not working with starmap. You can use the following:
def f(args):
arg1, arg2 = args
... do something with arg1, arg2 ...
for _ in tqdm.tqdm(pool.imap_unordered(f, zip(list_of_args, list_of_args2)), total=total):
pass
Related
My TQDM progress bar doesn't show during my multithreaded process, I only see it after the process is finished
Here is a way to reproduce the problem
I coded these two methods
from concurrent.futures import ProcessPoolExecutor
import sys
from colorama import Fore
def parallelize(desc, func, array, max_workers):
with ProcessPoolExecutor(max_workers=max_workers) as executor:
output_data = list(progress_bar(desc, list(executor.map(func,array))))
return output_data
def progress_bar(desc, array):
return tqdm(array,
total=len(array),
file=sys.stdout,
ascii=' >',
desc=desc,
bar_format="%s{l_bar}%s{bar:30}%s{r_bar}" % (Fore.RESET, Fore.BLUE, Fore.RESET))
you can test it this way
from tqdm import tqdm
test = range(int(1e4))
def identity(x):
return x
parallelize("", identity, test, 2)
It should print this (00:00) but the process takes around 3sc
100%|>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>| 10000/10000 [00:00<00:00, 3954279.25it/s]
Thanks for the help
I think this is cause when you call your progress bar
output_data = list(progress_bar(desc, list(executor.map(func,array))))
python first executor.map(func, array) and only then pass the results to progress_bar. It won't be the same but I can share with you a boiler plate of how to parallelize a python function.
from joblib import Parallel, delayed
def func(a):
# Do something
# Parallelize the call
Parallel(n_jobs=-1)(delayed(func)(a) for a in tqdm(array, total=len(array))
replaced this method and it worked
def parallelize(desc, func, array, max_workers):
return Parallel(n_jobs=max_workers)(delayed(func)(a) for a in progress_bar(desc, array))
The task I am trying to achieve is to process thousands of artifacts of different sizes on a multi core machine. I wish to use the process pool executor to distribute the jobs and have each worker tell me which file it is working on.
So far, I have the following:
from concurrent.futures import ProcessPoolExecutor
from itertools import islice, cycle
import time
import tqdm
import multiprocessing
import random
worker_count = min(multiprocessing.cpu_count(), 10)
flist=range(100)
executor = ProcessPoolExecutor(max_workers=worker_count)
with tqdm.tqdm(total=len(flist), leave=False) as t:
t.set_description_str("Extracting ... ")
pbars = []
for idx in range(t.pos + 1, t.pos + 1 + worker_count):
pbars.append(tqdm.tqdm(position=idx, bar_format='{desc}', leave=False))
def process(entry):
artifact, idx = entry
time.sleep(random.randint(0, worker_count)/10.0)
pbars[idx].set_description_str(f'Working on {artifact}', refresh=True)
return artifact
for _, _ in zip(flist, executor.map(process, zip(flist, islice(cycle(range(worker_count)), len(flist))))):
t.update()
for idx in range(worker_count):
pbars[idx].set_description_str(" "*(pbars[idx].ncols - 1), refresh=True)
pbars[idx].clear()
pbars[idx].close()
Of course, instead of the numbers, I will be displaying the file names.
Now, the questions are:
Is there a better pythonic way to achieve what I want?
The last bit about clearing pbars seems obnoxious to me. I do that basically to clear up the terminal when the program finishes. Perhaps there is a better way?
I want to parallelize a task (progresser()) for a range of input parameters (L). The progress of each task should be monitored by an individual progress bar in the terminal. I'm using the tqdm package for the progress bars. The following code works on my Mac for up to 23 progress bars (L = list(range(23)) and below), but produces chaotic jumping of the progress bars starting at L = list(range(24)). Has anyone an idea how to fix this?
from time import sleep
import random
from tqdm import tqdm
from multiprocessing import Pool, freeze_support, RLock
L = list(range(24)) # works until 23, breaks starting at 24
def progresser(n):
text = f'#{n}'
sampling_counts = 10
with tqdm(total=sampling_counts, desc=text, position=n+1) as pbar:
for i in range(sampling_counts):
sleep(random.uniform(0, 1))
pbar.update(1)
if __name__ == '__main__':
freeze_support()
p = Pool(processes=None,
initargs=(RLock(),), initializer=tqdm.set_lock
)
p.map(progresser, L)
print('\n' * (len(L) + 1))
As an example of how it should look like in general, I provide a screenshot for L = list(range(16)) below.
versions: python==3.7.3, tqdm==4.32.1
I'm not getting any jumping when I set the size to 30. Maybe you have more processors and can have more workers running.
However, if n grows large you will start to see jumps because of the nature of the chunksize.
I.e
p.map will split your input into chunksizes and give each process a chunk. So as n grows larger, so does your chunksize, and so does your ....... yup position (pos=n+1)!
Note: Although map preserves the order of the results returned. The order its computed is arbitrary.
As n grows large I would suggest using processor id as the position to view progress on a per process basis.
from time import sleep
import random
from tqdm import tqdm
from multiprocessing import Pool, freeze_support, RLock
from multiprocessing import current_process
def progresser(n):
text = f'#{n}'
sampling_counts = 10
current = current_process()
pos = current._identity[0]-1
with tqdm(total=sampling_counts, desc=text, position=pos) as pbar:
for i in range(sampling_counts):
sleep(random.uniform(0, 1))
pbar.update(1)
if __name__ == '__main__':
freeze_support()
L = list(range(30)) # works until 23, breaks starting at 24
# p = Pool(processes=None,
# initargs=(RLock(),), initializer=tqdm.set_lock
# )
with Pool(initializer=tqdm.set_lock, initargs=(tqdm.get_lock(),)) as p:
p.map(progresser, L)
print('\n' * (len(L) + 1))
I have a multithreaded function that I would like a status bar for using tqdm. Is there an easy way to show a status bar with ThreadPoolExecutor? It is the parallelization part that is confusing me.
import concurrent.futures
def f(x):
return f**2
my_iter = range(1000000)
def run(f,my_iter):
with concurrent.futures.ThreadPoolExecutor() as executor:
function = list(executor.map(f, my_iter))
return results
run(f, my_iter) # wrap tqdr around this function?
You can wrap tqdm around the executor as the following to track the progress:
list(tqdm(executor.map(f, iter), total=len(iter))
Here is your example:
import time
import concurrent.futures
from tqdm import tqdm
def f(x):
time.sleep(0.001) # to visualize the progress
return x**2
def run(f, my_iter):
with concurrent.futures.ThreadPoolExecutor() as executor:
results = list(tqdm(executor.map(f, my_iter), total=len(my_iter)))
return results
my_iter = range(100000)
run(f, my_iter)
And the result is like this:
16%|██▏ | 15707/100000 [00:00<00:02, 31312.54it/s]
The problem with the accepted answer is that the ThreadPoolExecutor.map function is obliged to generate results not in the order that they become available. So if the first invocation of myfunc happens to be, for example, the last one to complete, the progress bar will go from 0% to 100% all at once and only when all of the calls have completed. Much better would be to use ThreadPoolExecutor.submit with as_completed:
import time
import concurrent.futures
from tqdm import tqdm
def f(x):
time.sleep(0.001) # to visualize the progress
return x**2
def run(f, my_iter):
l = len(my_iter)
with tqdm(total=l) as pbar:
# let's give it some more threads:
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
futures = {executor.submit(f, arg): arg for arg in my_iter}
results = {}
for future in concurrent.futures.as_completed(futures):
arg = futures[future]
results[arg] = future.result()
pbar.update(1)
print(321, results[321])
my_iter = range(100000)
run(f, my_iter)
Prints:
321 103041
This is just the general idea. Depending upon the type of my_iter, it may not be possible to directly take apply the len function directly to it without first converting it into a list. The main point is to use submit with as_completed.
Most short way, i think:
with ThreadPoolExecutor(max_workers=20) as executor:
results = list(tqdm(executor.map(myfunc, range(len(my_array))), total=len(my_array)))
tried the example but progress bar fails still, and I find this post, seems useful in short way to use:
def tqdm_parallel_map(fn, *iterables):
""" use tqdm to show progress"""
executor = concurrent.futures.ProcessPoolExecutor()
futures_list = []
for iterable in iterables:
futures_list += [executor.submit(fn, i) for i in iterable]
for f in tqdm(concurrent.futures.as_completed(futures_list), total=len(futures_list)):
yield f.result()
def multi_cpu_dispatcher_process_tqdm(data_list, single_job_fn):
""" multi cpu dispatcher """
output = []
for result in tqdm_parallel_map(single_job_fn, data_list):
output += result
return output
I find more intuitive to use the update() method of tqdm, we keep an human readable structure:
with tqdm(total=len(mylist)) as progress:
with ThreadPoolExecutor() as executor:
for __ in executor.map(fun, mylist):
progress.update() # We update the progress bar each time that a job finish
Since I don't care about the output of fun I use __ as throwaway variable.
I am struggling for a while with Multiprocessing in Python. I would like to run 2 independent functions simultaneously, wait until both calculations are finished and then continue with the output of both functions. Something like this:
# Function A:
def jobA(num):
result=num*2
return result
# Fuction B:
def jobB(num):
result=num^3
return result
# Parallel process function:
{resultA,resultB}=runInParallel(jobA(num),jobB(num))
I found other examples of multiprocessing however they used only one function or didn't returned an output. Anyone knows how to do this? Many thanks!
I'd recommend creating processes manually (rather than as part of a pool), and sending the return values to the main process through a multiprocessing.Queue. These queues can share almost any Python object in a safe and relatively efficient way.
Here's an example, using the jobs you've posted.
def jobA(num, q):
q.put(num * 2)
def jobB(num, q):
q.put(num ^ 3)
import multiprocessing as mp
q = mp.Queue()
jobs = (jobA, jobB)
args = ((10, q), (2, q))
for job, arg in zip(jobs, args):
mp.Process(target=job, args=arg).start()
for i in range(len(jobs)):
print('Result of job {} is: {}'.format(i, q.get()))
This prints out:
Result of job 0 is: 20
Result of job 1 is: 1
But you can of course do whatever further processing you'd like using these values.