The task I am trying to achieve is to process thousands of artifacts of different sizes on a multi core machine. I wish to use the process pool executor to distribute the jobs and have each worker tell me which file it is working on.
So far, I have the following:
from concurrent.futures import ProcessPoolExecutor
from itertools import islice, cycle
import time
import tqdm
import multiprocessing
import random
worker_count = min(multiprocessing.cpu_count(), 10)
flist=range(100)
executor = ProcessPoolExecutor(max_workers=worker_count)
with tqdm.tqdm(total=len(flist), leave=False) as t:
t.set_description_str("Extracting ... ")
pbars = []
for idx in range(t.pos + 1, t.pos + 1 + worker_count):
pbars.append(tqdm.tqdm(position=idx, bar_format='{desc}', leave=False))
def process(entry):
artifact, idx = entry
time.sleep(random.randint(0, worker_count)/10.0)
pbars[idx].set_description_str(f'Working on {artifact}', refresh=True)
return artifact
for _, _ in zip(flist, executor.map(process, zip(flist, islice(cycle(range(worker_count)), len(flist))))):
t.update()
for idx in range(worker_count):
pbars[idx].set_description_str(" "*(pbars[idx].ncols - 1), refresh=True)
pbars[idx].clear()
pbars[idx].close()
Of course, instead of the numbers, I will be displaying the file names.
Now, the questions are:
Is there a better pythonic way to achieve what I want?
The last bit about clearing pbars seems obnoxious to me. I do that basically to clear up the terminal when the program finishes. Perhaps there is a better way?
Related
I am currently trying to parallize a rather large task of computing a complex system of differential equations. I want to parallize the computation, so each computation has its own process. I need the results to be ordered, therefore I am using a dictionary to order it after the process. I am also on Windows 10.
For now I am only running the identity function to check the code, but even then it simply runs all logical cores at 100% but does not compute (I waited 5 minutes).
Later on I will need to initalize each process with a bunch of variables to compute the actual system defined in a solver() function further up the code.
What is going wrong?
import multiprocessing as mp
import numpy as np
Nmin = 0
Nmax = 20
periods = np.linspace(Nmin, Nmax, 2*Nmax +1) # 0.5 steps
results = dict()
def identity(a):
return a
with mp.Manager() as manager:
sharedresults = manager.dict()
with mp.Pool() as pool:
print("pools are active")
for result in pool.map(identity, periods):
#sharedresults[per] = res
print(result)
orderedResult = []
for k,v in sorted(results.items()):
oderedResult.append(v)
The program gets to the "pools are active" message and after printing it, it just does nothing I guess?
I am also using Jupyterlab, not sure wether that is an issue.
there's a problem with multiprocessing and jupyterlab, so you should use pathos instead.
import multiprocessing as mp
import numpy as np
import scipy.constants as constants
from concurrent.futures import ProcessPoolExecutor
import pathos.multiprocessing as mpathos
Nmin = 0
Nmax = 20
periods = np.linspace(Nmin, Nmax, 2*Nmax +1) # 0.5 steps
results = dict()
def identity(a):
return a
with mp.Manager() as manager:
sharedresults = manager.dict()
with mpathos.Pool() as pool:
print("pools are active")
for result in pool.imap(identity, periods):
#sharedresults[per] = res
print(result)
orderedResult = []
for k,v in sorted(results.items()):
oderedResult.append(v)
I want to parallelize a task (progresser()) for a range of input parameters (L). The progress of each task should be monitored by an individual progress bar in the terminal. I'm using the tqdm package for the progress bars. The following code works on my Mac for up to 23 progress bars (L = list(range(23)) and below), but produces chaotic jumping of the progress bars starting at L = list(range(24)). Has anyone an idea how to fix this?
from time import sleep
import random
from tqdm import tqdm
from multiprocessing import Pool, freeze_support, RLock
L = list(range(24)) # works until 23, breaks starting at 24
def progresser(n):
text = f'#{n}'
sampling_counts = 10
with tqdm(total=sampling_counts, desc=text, position=n+1) as pbar:
for i in range(sampling_counts):
sleep(random.uniform(0, 1))
pbar.update(1)
if __name__ == '__main__':
freeze_support()
p = Pool(processes=None,
initargs=(RLock(),), initializer=tqdm.set_lock
)
p.map(progresser, L)
print('\n' * (len(L) + 1))
As an example of how it should look like in general, I provide a screenshot for L = list(range(16)) below.
versions: python==3.7.3, tqdm==4.32.1
I'm not getting any jumping when I set the size to 30. Maybe you have more processors and can have more workers running.
However, if n grows large you will start to see jumps because of the nature of the chunksize.
I.e
p.map will split your input into chunksizes and give each process a chunk. So as n grows larger, so does your chunksize, and so does your ....... yup position (pos=n+1)!
Note: Although map preserves the order of the results returned. The order its computed is arbitrary.
As n grows large I would suggest using processor id as the position to view progress on a per process basis.
from time import sleep
import random
from tqdm import tqdm
from multiprocessing import Pool, freeze_support, RLock
from multiprocessing import current_process
def progresser(n):
text = f'#{n}'
sampling_counts = 10
current = current_process()
pos = current._identity[0]-1
with tqdm(total=sampling_counts, desc=text, position=pos) as pbar:
for i in range(sampling_counts):
sleep(random.uniform(0, 1))
pbar.update(1)
if __name__ == '__main__':
freeze_support()
L = list(range(30)) # works until 23, breaks starting at 24
# p = Pool(processes=None,
# initargs=(RLock(),), initializer=tqdm.set_lock
# )
with Pool(initializer=tqdm.set_lock, initargs=(tqdm.get_lock(),)) as p:
p.map(progresser, L)
print('\n' * (len(L) + 1))
I'm trying to add a progression bar to my program, however, solutions that seems to works for other (on other posts) do not work for me.
Python version 3.6.
import multiprocessing as mp
import tqdm
def f(dynamic, fix1, fix2):
return dynamic + fix1 + fix2
N = 2
fix1 = 5
fix2= 10
dynamic = range(10)
p = mp.Pool(processes = N)
for _ in tqdm.tqdm(p.starmap(f, [(d, fix1, fix2) for d in dynamic]), total = len(dynamic)):
pass
p.close()
p.join()
Any idea why the multiprocessing works (the computation is done), but there is no progress bar?
NB: The example above is dummy, my function are different.
Other question: how can I interrupt properly a multiprocessing program? The ctrl+C that I usually do in signle thread seems to pose some issues.
Unfortunately, tqdm is not working with starmap. You can use the following:
def f(args):
arg1, arg2 = args
... do something with arg1, arg2 ...
for _ in tqdm.tqdm(pool.imap_unordered(f, zip(list_of_args, list_of_args2)), total=total):
pass
I am running a multiprocessing pool in a for loop over a chuck of data. It runs fine for two iterations and hangs on the third. If I reduce the size of each chuck it hangs later on perhaps the forth or fifth iteration. In the program where I discovered the problem I am running a more extensive function but this works to reproduce the error.
Is there a proper way to terminate a pool after it is finished? So that I can start it again.
import pandas as pd
import numpy as np
from multiprocess import Pool
df = pd.read_csv('paths.csv')
def do_something(user):
v = df[df['userId'] == user]
return v
if __name__ == '__main__':
users = df['userId'].unique()
n_chunks = round(len(users)/40)
subsets = [users[i:i+n_chunks] for i in range(0, len(users), n_chunks)]
chunk_counter = 0
for user_subset in subsets:
chunk_counter += 1
print(f'Beginning to process chunk {chunk_counter}...')
with Pool() as pool:
frames = pool.map(do_something, user_subset)
pool.close()
pool.terminate()
print(f'Completed processing chunk {chunk_counter}.')
I was able to prevent the hanging with the code below:
with Pool(maxtasksperchild=1) as pool:
frames = pool.map_async(do_something, user_subset).get()
pool.terminate()
pool.join()
I don't understand why using map_async would prevent the hanging. I will dive deeper if I have a chance and update if I understand the reason.
I am testing the parallel capabilities of Python3, which I intend to use in my code. I observe unexpectedly slow behaviour, and so I boil down my code to the following proof of principle. Let's calculate a simple logarithmic series. Let's do it serial, and in parallel using 1 core. One would imagine that the timing for these two examples would be the same, except for a small overhead associated with initializing and closing the multiprocessing.Pool class. However, what I observe is that the overhead grows linearly with problem size, and thus the parallel solution on 1 core is significantly worse relative to the serial solution even for large inputs. Please tell me if I am doing something wrong
import time
import numpy as np
import multiprocessing
import matplotlib.pyplot as plt
def foo(x):
return sum([np.log(1 + i*x) for i in range(10)])
def serial_series(rangeMax):
return [foo(x) for x in range(rangeMax)]
def parallel_series_1core(rangeMax):
pool = multiprocessing.Pool(processes=1)
rez = pool.map(foo, tuple(range(rangeMax)))
pool.terminate()
pool.join()
return rez
nTask = [1 + i ** 2 * 1000 for i in range(1, 2)]
nTimeSerial = []
nTimeParallel = []
for taskSize in nTask:
print('TaskSize', taskSize)
start = time.time()
rez = serial_series(taskSize)
end = time.time()
nTimeSerial.append(end - start)
start = time.time()
rez = parallel_series_1core(taskSize)
end = time.time()
nTimeParallel.append(end - start)
plt.plot(nTask, nTimeSerial)
plt.plot(nTask, nTimeParallel)
plt.legend(['serial', 'parallel 1 core'])
plt.show()
Edit:
It was commented that the overhead my be due to creating multiple jobs. Here is a modification of the parallel function that should explicitly only make 1 job. I still observe linear growth of the overhead
def parallel_series_1core(rangeMax):
pool = multiprocessing.Pool(processes=1)
rez = pool.map(serial_series, [rangeMax])
pool.terminate()
pool.join()
return rez
Edit 2: Once more, the exact code that produces linear growth. It can be tested with a print statement inside the serial_series function that it is only called once for each call of parallel_series_1core.
import time
import numpy as np
import multiprocessing
import matplotlib.pyplot as plt
def foo(x):
return sum([np.log(1 + i*x) for i in range(10)])
def serial_series(rangeMax):
return [foo(i) for i in range(rangeMax)]
def parallel_series_1core(rangeMax):
pool = multiprocessing.Pool(processes=1)
rez = pool.map(serial_series, [rangeMax])
pool.terminate()
pool.join()
return rez
nTask = [1 + i ** 2 * 1000 for i in range(1, 20)]
nTimeSerial = []
nTimeParallel = []
for taskSize in nTask:
print('TaskSize', taskSize)
start = time.time()
rez1 = serial_series(taskSize)
end = time.time()
nTimeSerial.append(end - start)
start = time.time()
rez2 = parallel_series_1core(taskSize)
end = time.time()
nTimeParallel.append(end - start)
plt.plot(nTask, nTimeSerial)
plt.plot(nTask, nTimeParallel)
plt.plot(nTask, [i / j for i,j in zip(nTimeParallel, nTimeSerial)])
plt.legend(['serial', 'parallel 1 core', 'ratio'])
plt.show()
When you use Pool.map() you're essentially telling it to split the passed iterable into jobs over all available sub-processes (which is one in your case) - the larger the iterable the more 'jobs' are created on the first call. That's what initially adds a huge (trumped only by the process creation itself), albeit linear overhead.
Since sub-processes do not share memory, for all changing data on POSIX systems (due to forking) and all data (even static) on Windows it needs to pickle it on one end and unpickle it on the other. Plus it needs time to clear out the process stack for the next job, plus there is an overhead in system thread switching (that's out of your control, you'd have to mess with the system's scheduler to reduce that one).
For simple/quick tasks a single process will always trump multiprocessing.
UPDATE - As I was saying above, the additional overhead comes from the fact that for any data exchange between processes Python transparently does pickling/unpickling routine. Since the list you return from the serial_series() function grows in size over time, so does the performance penalty for pickling/unpickling. Here's a simple demonstration of it based on your code:
import math
import pickle
import sys
import time
# multi-platform precision timer
get_timer = time.clock if sys.platform == "win32" else time.time
def foo(x): # logic/computation function
return sum([math.log(1 + i*x) for i in range(10)])
def serial_series(max_range): # main sub-process function
return [foo(i) for i in range(max_range)]
def serial_series_slave(max_range): # subprocess interface
return pickle.dumps(serial_series(pickle.loads(max_range)))
def serial_series_master(max_range): # main process interface
return pickle.loads(serial_series_slave(pickle.dumps(max_range)))
tasks = [1 + i ** 2 * 1000 for i in range(1, 20)]
simulated_times = []
for task in tasks:
print("Simulated task size: {}".format(task))
start = get_timer()
res = serial_series_master(task)
simulated_times.append((task, get_timer() - start))
At the end, simulated_times will contain something like:
[(1001, 0.010015994115533963), (4001, 0.03402641167313844), (9001, 0.06755546622419131),
(16001, 0.1252664260421834), (25001, 0.18815836740279515), (36001, 0.28339434475444325),
(49001, 0.3757235840503601), (64001, 0.4813749807557435), (81001, 0.6115452710446636),
(100001, 0.7573718332506543), (121001, 0.9228750064147522), (144001, 1.0909038813527427),
(169001, 1.3017281342479343), (196001, 1.4830192955746764), (225001, 1.7117389965616931),
(256001, 1.9392146632682739), (289001, 2.19192682050668), (324001, 2.4497541011649187),
(361001, 2.7481495578097466)]
showing clear greater-than-linear processing time increase as the list grows bigger. This is what essentially happens with multiprocessing - if your sub-process function didn't return anything it would end up considerably faster.
If you have a large amount of data you need to share among processes, I'd suggest you to use some in-memory database (like Redis) and have your sub-processes connect to it to store/retrieve data.