I want to parallelize a task (progresser()) for a range of input parameters (L). The progress of each task should be monitored by an individual progress bar in the terminal. I'm using the tqdm package for the progress bars. The following code works on my Mac for up to 23 progress bars (L = list(range(23)) and below), but produces chaotic jumping of the progress bars starting at L = list(range(24)). Has anyone an idea how to fix this?
from time import sleep
import random
from tqdm import tqdm
from multiprocessing import Pool, freeze_support, RLock
L = list(range(24)) # works until 23, breaks starting at 24
def progresser(n):
text = f'#{n}'
sampling_counts = 10
with tqdm(total=sampling_counts, desc=text, position=n+1) as pbar:
for i in range(sampling_counts):
sleep(random.uniform(0, 1))
pbar.update(1)
if __name__ == '__main__':
freeze_support()
p = Pool(processes=None,
initargs=(RLock(),), initializer=tqdm.set_lock
)
p.map(progresser, L)
print('\n' * (len(L) + 1))
As an example of how it should look like in general, I provide a screenshot for L = list(range(16)) below.
versions: python==3.7.3, tqdm==4.32.1
I'm not getting any jumping when I set the size to 30. Maybe you have more processors and can have more workers running.
However, if n grows large you will start to see jumps because of the nature of the chunksize.
I.e
p.map will split your input into chunksizes and give each process a chunk. So as n grows larger, so does your chunksize, and so does your ....... yup position (pos=n+1)!
Note: Although map preserves the order of the results returned. The order its computed is arbitrary.
As n grows large I would suggest using processor id as the position to view progress on a per process basis.
from time import sleep
import random
from tqdm import tqdm
from multiprocessing import Pool, freeze_support, RLock
from multiprocessing import current_process
def progresser(n):
text = f'#{n}'
sampling_counts = 10
current = current_process()
pos = current._identity[0]-1
with tqdm(total=sampling_counts, desc=text, position=pos) as pbar:
for i in range(sampling_counts):
sleep(random.uniform(0, 1))
pbar.update(1)
if __name__ == '__main__':
freeze_support()
L = list(range(30)) # works until 23, breaks starting at 24
# p = Pool(processes=None,
# initargs=(RLock(),), initializer=tqdm.set_lock
# )
with Pool(initializer=tqdm.set_lock, initargs=(tqdm.get_lock(),)) as p:
p.map(progresser, L)
print('\n' * (len(L) + 1))
Related
I am currently trying to parallize a rather large task of computing a complex system of differential equations. I want to parallize the computation, so each computation has its own process. I need the results to be ordered, therefore I am using a dictionary to order it after the process. I am also on Windows 10.
For now I am only running the identity function to check the code, but even then it simply runs all logical cores at 100% but does not compute (I waited 5 minutes).
Later on I will need to initalize each process with a bunch of variables to compute the actual system defined in a solver() function further up the code.
What is going wrong?
import multiprocessing as mp
import numpy as np
Nmin = 0
Nmax = 20
periods = np.linspace(Nmin, Nmax, 2*Nmax +1) # 0.5 steps
results = dict()
def identity(a):
return a
with mp.Manager() as manager:
sharedresults = manager.dict()
with mp.Pool() as pool:
print("pools are active")
for result in pool.map(identity, periods):
#sharedresults[per] = res
print(result)
orderedResult = []
for k,v in sorted(results.items()):
oderedResult.append(v)
The program gets to the "pools are active" message and after printing it, it just does nothing I guess?
I am also using Jupyterlab, not sure wether that is an issue.
there's a problem with multiprocessing and jupyterlab, so you should use pathos instead.
import multiprocessing as mp
import numpy as np
import scipy.constants as constants
from concurrent.futures import ProcessPoolExecutor
import pathos.multiprocessing as mpathos
Nmin = 0
Nmax = 20
periods = np.linspace(Nmin, Nmax, 2*Nmax +1) # 0.5 steps
results = dict()
def identity(a):
return a
with mp.Manager() as manager:
sharedresults = manager.dict()
with mpathos.Pool() as pool:
print("pools are active")
for result in pool.imap(identity, periods):
#sharedresults[per] = res
print(result)
orderedResult = []
for k,v in sorted(results.items()):
oderedResult.append(v)
I'm currently setting up a automated simulation pipeline for OpenFOAM (CFD library) using the PyFoam library within Python to create a large database for machine learning purposes. The database will have around 500k distinct simulations. To run this pipeline on multiple machines, I'm using the multiprocessing.Pool.starmap_async(args) option which will continually start a new simulation once the old simulation has completed.
However, since some of the simulations might / will crash, I want to generate a textfile with all cases which have crashed.
I've already found this thread which implements the multiprocessing.Manager.Queue() and adds a listener but I failed to get it running with starmap_async(). For my testing I'm trying to print the case name for any simulation which has been completed but currently only one entry is written into the text file instead of all of them (the simulations all complete successfully).
So my question is how can I write a message to a file for each simulation which has completed.
The current code layout looks roughly like this - only important snipped has been added as the remaining code can't be run without OpenFOAM and additional customs scripts which were created for the automation.
Any help is highly appreciated! :)
from PyFoam.Execution.BasicRunner import BasicRunner
from PyFoam.Execution.ParallelExecution import LAMMachine
import numpy as np
import multiprocessing
import itertools
import psutil
# Defining global variables
manager = multiprocessing.Manager()
queue = manager.Queue()
def runCase(airfoil, angle, velocity):
# define simulation name
newCase = str(airfoil) + "_" + str(angle) + "_" + str(velocity)
'''
A lot of pre-processing commands to prepare the simulation
which has been removed from snipped such as generate geometry, create mesh etc...
'''
# run simulation
machine = LAMMachine(nr=4) # set number of cores for parallel execution
simulation = BasicRunner(argv=[solver, "-case", case.name], silent=True, lam=machine, logname="solver")
simulation.start() # start simulation
# check if simulation has completed
if simulation.runOK():
# write message into queue
queue.put(newCase)
if not simulation.runOK():
print("Simulation did not run successfully")
def listener(queue):
fname = 'errors.txt'
msg = queue.get()
while True:
with open(fname, 'w') as f:
if msg == 'complete':
break
f.write(str(msg) + '\n')
def main():
# Create parameter list
angles = np.arange(-5, 0, 1)
machs = np.array([0.15])
nacas = ['0012']
paramlist = list(itertools.product(nacas, angles, np.round(machs, 9)))
# create number of processes and keep 2 cores idle for other processes
nCores = psutil.cpu_count(logical=False) - 2
nProc = 4
nProcs = int(nCores / nProc)
with multiprocessing.Pool(processes=nProcs) as pool:
pool.apply_async(listener, (queue,)) # start the listener
pool.starmap_async(runCase, paramlist).get() # run parallel simulations
queue.put('complete')
pool.close()
pool.join()
if __name__ == '__main__':
main()
First, when your with multiprocessing.Pool(processes=nProcs) as pool: exits, there will be an implicit call to pool.terminate(), which will kill all pool processes and with it any running or queued up tasks. There is no point in calling queue.put('complete') since nobody is listening.
Second, your 'listener" task gets only a single message from the queue. If is "complete", it terminates immediately. If it is something else, it just loops continuously writing the same message to the output file. This cannot be right, can it? Did you forget an additional call to queue.get() in your loop?
Third, I do not quite follow your computation for nProcs. Why the division by 4? If you had 5 physical processors nProcs would be computed as 0. Do you mean something like:
nProcs = psutil.cpu_count(logical=False) // 4
if nProcs == 0:
nProcs = 1
elif nProcs > 1:
nProcs -= 1 # Leave a core free
Fourth, why do you need a separate "listener" task? Have your runCase task return whatever message is appropriate according to how it completes back to the main process. In the code below, multiprocessing.pool.Pool.imap is used so that results can be processed as the tasks complete and results returned:
from PyFoam.Execution.BasicRunner import BasicRunner
from PyFoam.Execution.ParallelExecution import LAMMachine
import numpy as np
import multiprocessing
import itertools
import psutil
def runCase(tpl):
# Unpack tuple:
airfoil, angle, velocity = tpl
# define simulation name
newCase = str(airfoil) + "_" + str(angle) + "_" + str(velocity)
... # Code omitted for brevity
# check if simulation has completed
if simulation.runOK():
return '' # No error
# Simulation did not run successfully:
return f"Simulation {newcase} did not run successfully"
def main():
# Create parameter list
angles = np.arange(-5, 0, 1)
machs = np.array([0.15])
nacas = ['0012']
# There is no reason to convert this into a list; it
# can be lazilly computed:
paramlist = itertools.product(nacas, angles, np.round(machs, 9))
# create number of processes and keep 1 core idle for main process
nCores = psutil.cpu_count(logical=False) - 1
nProc = 4
nProcs = int(nCores / nProc)
with multiprocessing.Pool(processes=nProcs) as pool:
with open('errors.txt', 'w') as f:
# Process message results as soon as the task ends.
# Use method imap_unordered if you do not care about the order
# of the messages in the output.
# We can only pass a single argument using imap, so make it a tuple:
for msg in pool.imap(runCase, zip(paramlist)):
if msg != '': # Error completion
print(msg)
print(msg, file=f)
pool.join() # Not really necessary here
if __name__ == '__main__':
main()
Question
I want to do a simple loop over n iterations. Each iteration contains a copy operation of which I know how many bytes were copied. My question is: how can I also show the number of MB/s in the progress bar?
Example
I'm showing a progress bar around rsync. I modified this answer as follows:
import subprocess
import sys
import re
import tqdm
n = len(open('myfiles.txt').readlines())
your_command = 'rsync -aP --files-from="myfiles.txt" myhost:mysource .'
pbar = tqdm.trange(n)
process = subprocess.Popen(your_command, stdout=subprocess.PIPE, shell=True)
for line in iter(process.stdout.readline, ''):
line = line.decode("utf-8")
if re.match(r'(.*)(xfr\#)([0-9])(.*)(to\-chk\=)([0-9])(.*)', line):
pbar.update()
Whereby myfiles.txt contains a list of files. This gives me a good progress bar, showing the number of iterations per second.
However, the summary line that I match on to signal that a file was copied, e.g.
682,356 100% 496.92kB/s 0:00:01 (xfr#5, to-chk=16756/22445)
also contains the number of bytes that were copied, which I want to use to show a copying speed.
Below I provided code doing what you need.
Because I didn't have your example data I created simple generator function that emulates process of retrieving data. It generates at random points of time (on each iteration) value equal to random number of bytes (not mega-bytes).
There are two tqdm bars below, one is used for progress, it measures iterations per second and total number of iterations done and percentage. And second bar that measures speed in megabytes per second and total number of megabytes received.
Try it online!
import tqdm
def gen(cnt):
import time, random
for i in range(cnt):
time.sleep(random.random() * 0.125)
yield random.randrange(1 << 20)
total_iterations = 150
pbar = tqdm.tqdm(total = total_iterations, ascii = True)
sbar = tqdm.tqdm(unit = 'B', ascii = True, unit_scale = True)
for e in gen(total_iterations):
pbar.update()
sbar.update(e)
Output:
9%|█████████▉ | 89/1000 [00:11<01:53, 8.05it/s]
40.196MiB [00:11, 3.63MiB/s]
ASCII video (+link):
The task I am trying to achieve is to process thousands of artifacts of different sizes on a multi core machine. I wish to use the process pool executor to distribute the jobs and have each worker tell me which file it is working on.
So far, I have the following:
from concurrent.futures import ProcessPoolExecutor
from itertools import islice, cycle
import time
import tqdm
import multiprocessing
import random
worker_count = min(multiprocessing.cpu_count(), 10)
flist=range(100)
executor = ProcessPoolExecutor(max_workers=worker_count)
with tqdm.tqdm(total=len(flist), leave=False) as t:
t.set_description_str("Extracting ... ")
pbars = []
for idx in range(t.pos + 1, t.pos + 1 + worker_count):
pbars.append(tqdm.tqdm(position=idx, bar_format='{desc}', leave=False))
def process(entry):
artifact, idx = entry
time.sleep(random.randint(0, worker_count)/10.0)
pbars[idx].set_description_str(f'Working on {artifact}', refresh=True)
return artifact
for _, _ in zip(flist, executor.map(process, zip(flist, islice(cycle(range(worker_count)), len(flist))))):
t.update()
for idx in range(worker_count):
pbars[idx].set_description_str(" "*(pbars[idx].ncols - 1), refresh=True)
pbars[idx].clear()
pbars[idx].close()
Of course, instead of the numbers, I will be displaying the file names.
Now, the questions are:
Is there a better pythonic way to achieve what I want?
The last bit about clearing pbars seems obnoxious to me. I do that basically to clear up the terminal when the program finishes. Perhaps there is a better way?
I am testing the parallel capabilities of Python3, which I intend to use in my code. I observe unexpectedly slow behaviour, and so I boil down my code to the following proof of principle. Let's calculate a simple logarithmic series. Let's do it serial, and in parallel using 1 core. One would imagine that the timing for these two examples would be the same, except for a small overhead associated with initializing and closing the multiprocessing.Pool class. However, what I observe is that the overhead grows linearly with problem size, and thus the parallel solution on 1 core is significantly worse relative to the serial solution even for large inputs. Please tell me if I am doing something wrong
import time
import numpy as np
import multiprocessing
import matplotlib.pyplot as plt
def foo(x):
return sum([np.log(1 + i*x) for i in range(10)])
def serial_series(rangeMax):
return [foo(x) for x in range(rangeMax)]
def parallel_series_1core(rangeMax):
pool = multiprocessing.Pool(processes=1)
rez = pool.map(foo, tuple(range(rangeMax)))
pool.terminate()
pool.join()
return rez
nTask = [1 + i ** 2 * 1000 for i in range(1, 2)]
nTimeSerial = []
nTimeParallel = []
for taskSize in nTask:
print('TaskSize', taskSize)
start = time.time()
rez = serial_series(taskSize)
end = time.time()
nTimeSerial.append(end - start)
start = time.time()
rez = parallel_series_1core(taskSize)
end = time.time()
nTimeParallel.append(end - start)
plt.plot(nTask, nTimeSerial)
plt.plot(nTask, nTimeParallel)
plt.legend(['serial', 'parallel 1 core'])
plt.show()
Edit:
It was commented that the overhead my be due to creating multiple jobs. Here is a modification of the parallel function that should explicitly only make 1 job. I still observe linear growth of the overhead
def parallel_series_1core(rangeMax):
pool = multiprocessing.Pool(processes=1)
rez = pool.map(serial_series, [rangeMax])
pool.terminate()
pool.join()
return rez
Edit 2: Once more, the exact code that produces linear growth. It can be tested with a print statement inside the serial_series function that it is only called once for each call of parallel_series_1core.
import time
import numpy as np
import multiprocessing
import matplotlib.pyplot as plt
def foo(x):
return sum([np.log(1 + i*x) for i in range(10)])
def serial_series(rangeMax):
return [foo(i) for i in range(rangeMax)]
def parallel_series_1core(rangeMax):
pool = multiprocessing.Pool(processes=1)
rez = pool.map(serial_series, [rangeMax])
pool.terminate()
pool.join()
return rez
nTask = [1 + i ** 2 * 1000 for i in range(1, 20)]
nTimeSerial = []
nTimeParallel = []
for taskSize in nTask:
print('TaskSize', taskSize)
start = time.time()
rez1 = serial_series(taskSize)
end = time.time()
nTimeSerial.append(end - start)
start = time.time()
rez2 = parallel_series_1core(taskSize)
end = time.time()
nTimeParallel.append(end - start)
plt.plot(nTask, nTimeSerial)
plt.plot(nTask, nTimeParallel)
plt.plot(nTask, [i / j for i,j in zip(nTimeParallel, nTimeSerial)])
plt.legend(['serial', 'parallel 1 core', 'ratio'])
plt.show()
When you use Pool.map() you're essentially telling it to split the passed iterable into jobs over all available sub-processes (which is one in your case) - the larger the iterable the more 'jobs' are created on the first call. That's what initially adds a huge (trumped only by the process creation itself), albeit linear overhead.
Since sub-processes do not share memory, for all changing data on POSIX systems (due to forking) and all data (even static) on Windows it needs to pickle it on one end and unpickle it on the other. Plus it needs time to clear out the process stack for the next job, plus there is an overhead in system thread switching (that's out of your control, you'd have to mess with the system's scheduler to reduce that one).
For simple/quick tasks a single process will always trump multiprocessing.
UPDATE - As I was saying above, the additional overhead comes from the fact that for any data exchange between processes Python transparently does pickling/unpickling routine. Since the list you return from the serial_series() function grows in size over time, so does the performance penalty for pickling/unpickling. Here's a simple demonstration of it based on your code:
import math
import pickle
import sys
import time
# multi-platform precision timer
get_timer = time.clock if sys.platform == "win32" else time.time
def foo(x): # logic/computation function
return sum([math.log(1 + i*x) for i in range(10)])
def serial_series(max_range): # main sub-process function
return [foo(i) for i in range(max_range)]
def serial_series_slave(max_range): # subprocess interface
return pickle.dumps(serial_series(pickle.loads(max_range)))
def serial_series_master(max_range): # main process interface
return pickle.loads(serial_series_slave(pickle.dumps(max_range)))
tasks = [1 + i ** 2 * 1000 for i in range(1, 20)]
simulated_times = []
for task in tasks:
print("Simulated task size: {}".format(task))
start = get_timer()
res = serial_series_master(task)
simulated_times.append((task, get_timer() - start))
At the end, simulated_times will contain something like:
[(1001, 0.010015994115533963), (4001, 0.03402641167313844), (9001, 0.06755546622419131),
(16001, 0.1252664260421834), (25001, 0.18815836740279515), (36001, 0.28339434475444325),
(49001, 0.3757235840503601), (64001, 0.4813749807557435), (81001, 0.6115452710446636),
(100001, 0.7573718332506543), (121001, 0.9228750064147522), (144001, 1.0909038813527427),
(169001, 1.3017281342479343), (196001, 1.4830192955746764), (225001, 1.7117389965616931),
(256001, 1.9392146632682739), (289001, 2.19192682050668), (324001, 2.4497541011649187),
(361001, 2.7481495578097466)]
showing clear greater-than-linear processing time increase as the list grows bigger. This is what essentially happens with multiprocessing - if your sub-process function didn't return anything it would end up considerably faster.
If you have a large amount of data you need to share among processes, I'd suggest you to use some in-memory database (like Redis) and have your sub-processes connect to it to store/retrieve data.