I am trying to speed up my code by using multiprocessing with Python. The only problem I ran into when trying to implement multiprocessing was that my function has a return statement and I needed to save that data to a list. The best way I found using google was to use queue as "q.put()" and retrieve it using "q.get()". The only issue is that I think i'm not utilizing this the right way because when I use command prompt after compiling, it shows i'm hardly using my cpu and I only see one Python process running. If I remove "q.get()" the process is super fast and utilizes my cpu. Am I doing this the right way?
import time
import numpy as np
import pandas as pd
import multiprocessing
from multiprocessing import Process, Queue
def test(x,y,q):
q.put(x * y)
if __name__ == '__main__':
q = Queue()
one = []
two = []
three = []
start_time = time.time()
for x in np.arange(30, 60, 1):
for y in np.arange(0.01, 2, 0.5):
p = multiprocessing.Process(target=test, args=(x, y, q))
p.start()
one.append(q.get())
two.append(int(x))
three.append(float(y))
print(x, ' | ', y, ' | ', one[-1])
p.join()
print("--- %s seconds ---" % (time.time() - start_time))
d = {'x' : one, 'y': two, 'q' : three}
data = pd.DataFrame(d)
print(data.tail())
No, this is not correct. You start a process and wait for the result through q.get at once. Therefore only one process running at the same time. If you want to operate on many tasks, use multiprocessing.Pool:
import time
import numpy as np
from multiprocessing import Pool
from itertools import product
def test((x,y)):
return x, y, x * y
def main():
start_time = time.time()
pool = Pool()
result = pool.map(test, product(np.arange(30, 60, 1), np.arange(0.01, 2, 0.5)))
pool.close()
print("--- %s seconds ---" % (time.time() - start_time))
print(result)
if __name__ == '__main__':
main()
Related
I'm having trouble on how to do a return results comparison for each of the multiprocessing.
I am doing a multiprocessing for my function. My function will return a value. I want to run my function 5 times and compare the which process have the lowest return value. My code as below.
def do_processVal():
getParamInit()
do_evaluation()
currbestVal = bestGlobalVal
return 'Current best value: ', currbestVal, 'for process{}'.format(os.getpid())
from multiprocessing import Pool
import concurrent.futures
from os import getpid
import time
import os
start = time.perf_counter()
with concurrent.futures.ProcessPoolExecutor() as executor:
results = [executor.submit(do_processVal) for _ in range(5)]
for f in concurrent.futures.as_completed(results):
print(f.results())
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} second(s)')
Output as of now:
Current best value: 12909.5 for process 21918
Current best value: 12091.5 for process 21920
Current best value: 12350.0 for process 21919
Current best value: 12000.5 for process 21921
Current best value: 11901.0 for process 21922
Finish in 85.86 second(s)
What I want is from all the 5 return values above, I want to take the data for value that is the lowest. In this example process 21922 have the lowest value. So I want to assign the value to a parameter.
FinalbestVal = 11901.0
From what I see, you are mixing the responsibilities of your function and that is causing your issues. The function below returns only the data. The code that is run after the data is collected evaluates the data. Then the data is presented all at once. Separating the responsibilities of your code is a decades old key to cleaner code. The reason it has been supported as a best practice for so long is because of problems like the one you have run into. It also makes code easier to reuse later without having to change it.
def do_processVal():
getParamInit()
do_evaluation()
currbestVal = bestGlobalVal
return [currbestVal, os.getpid()]
from multiprocessing import Pool
import concurrent.futures
from os import getpid
import time
import os
start = time.perf_counter()
with concurrent.futures.ProcessPoolExecutor() as executor:
results = [executor.submit(do_processVal) for _ in range(5)]
best_value = -1
values_list = []
for f in concurrent.futures.as_completed(results):
values = f.result()
values_list.append(values)
if best_value == -1 or values[0] < best_value:
best_value = values[0]
for i in values_list:
print(f'Current best value: {i[0]} for process {i[1]}')
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} second(s)')
print(f'Final best = {best_value}')
If I'm not mistaken you could simply return currbestVal in do_processVal() instead of the string.
Then you can collect them and select the min
(...)
values = []
for f in concurrent.futures.as_completed(results):
values.append(f.results())
print(f"FinalbestVal = {min(values)}")
(...)
Try this:
def do_processVal():
getParamInit()
do_evaluation()
currbestVal = bestGlobalVal
return currbestVal, 'for process{}'.format(os.getpid())
from multiprocessing import Pool
import concurrent.futures
from os import getpid
import time
import os
start = time.perf_counter()
with concurrent.futures.ProcessPoolExecutor() as executor:
results = [executor.submit(do_processVal) for _ in range(5)]
minimal_value = 1000000
for f in concurrent.futures.as_completed(results):
res, s = f.results()
if res < minimal_value:
minimal_value = res
print('Current best value: '+ str(currbestVal) + s) # not sure what format s
will have, you might need
to change that
print("minimal value:" + str(minimal_value))
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} second(s)')
I want to translate a huge matlab model to python. Therefor I need to work on the key functions first. One key function handles parallel processing. Basically, a matrix with parameters is the input, in which every row represents the parameters for one run. These parameters are used within a computation-heavy function. This computation-heavy function should run in parallel, I don't need the results of a previous run for any other run. So all processes can run independent from eachother.
Why is starmap_async slower on my pc? Also: When i add more code (to test consecutive computation) my python crashes (i use spyder). Can you give me advice?
import time
import numpy as np
import multiprocessing as mp
from functools import partial
# Create simulated data matrix
data = np.random.random((100,3000))
data = np.column_stack((np.arange(1,len(data)+1,1),data))
def EAF_DGL(*z, package_num):
sum_row = 0
for i in range(1,np.shape(z)[0]):
sum_row = sum_row + z[i]
func_result = np.column_stack((package_num,z[0],sum_row))
return func_result
t0 = time.time()
if __name__ == "__main__":
package_num = 1
help_EAF_DGL = partial(EAF_DGL, package_num=1)
with mp.Pool() as pool:
#result = pool.starmap(partial(EAF_DGL, package_num), [(data[i]) for i in range(0,np.shape(data)[0])])
result = pool.starmap_async(help_EAF_DGL, [(data[i]) for i in range(0,np.shape(data)[0])]).get()
pool.close()
pool.join()
t1 = time.time()
calculation_time_parallel_async = t1-t0
print(calculation_time_parallel_async)
t2 = time.time()
if __name__ == "__main__":
package_num = 1
help_EAF_DGL = partial(EAF_DGL, package_num=1)
with mp.Pool() as pool:
#result = pool.starmap(partial(EAF_DGL, package_num), [(data[i]) for i in range(0,np.shape(data)[0])])
result = pool.starmap(help_EAF_DGL, [(data[i]) for i in range(0,np.shape(data)[0])])
pool.close()
pool.join()
t3 = time.time()
calculation_time_parallel = t3-t2
print(calculation_time_parallel)
I have two codes. One is pooled (multiprocessing) version of the other. However, the parallel version with even 1 processor is taking a long time whereas the serial version finishes in ~15 sec. Can someone help to accelerate the second version.
Serial
import numpy as np, time
def mapTo(d):
global tree
for idx, item in enumerate(list(d), start=1):
tree[str(item)].append(idx)
data=np.random.randint(1,4, 20000000)
tree = dict({"1":[],"2":[],"3":[]})
s= time.perf_counter()
mapTo(data)
e = time.perf_counter()
print("elapsed time:",e-s)
takes: ~15 sec
Parallel
from multiprocessing import Manager, Pool
from functools import partial
import numpy as np
import time
def mapTo(i_d,tree):
idx,item = i_d
l = tree[str(item)]
l.append(idx)
tree[str(item)] = l
manager = Manager()
data = np.random.randint(1,4, 20000000)
# sharedtree= manager.dict({"1":manager.list(),"2":manager.list(),"3":manager.list()})
sharedtree = manager.dict({"1":[],"2":[],"3":[]})
s= time.perf_counter()
with Pool(processes=1) as pool:
pool.map(partial(mapTo, tree=sharedtree), list(enumerate(data,start=1)))
e = time.perf_counter()
print("elapsed time:",e-s)
I was trying python multiprocessing module to reduce time for my filtering code. At the beginning I have done some experiment. Results are not promising.
I've defined a function to run a loop within a certain range. Then I've run this function with and without threading and measured the time. Here is my code:
import time
from multiprocessing.pool import ThreadPool
def do_loop(i,j):
l = []
for i in range(i,j):
l.append(i)
return l
#loop veriable
x = 7
#without thredding
start_time = time.time()
c = do_loop(0,10**x)
print("--- %s seconds ---" % (time.time() - start_time))
#with thredding
def thread_work(n):
#dividing loop size
a = 0
b = int(n/2)
c = int(n/2)
#multiprocessing
pool = ThreadPool(processes=10)
async_result1 = pool.apply_async(do_loop, (a,b))
async_result2 = pool.apply_async(do_loop, (b,c))
async_result3 = pool.apply_async(do_loop, (c,n))
#get the result from all processes]
result = async_result1.get() + async_result2.get() + async_result3.get()
return result
start_time = time.time()
ll = thread_work(10**x)
print("--- %s seconds ---" % (time.time() - start_time))
For x=7 the result is:
--- 1.0931916236877441 seconds ---
--- 1.4213247299194336 seconds ---
Without threading it takes less time. And here is another problem. For X=8, most of the time I get MemoryError for threading. Once I got this result:
--- 17.04124426841736 seconds ---
--- 32.871358156204224 seconds ---
The solution is important as I need to optimize a filtering task which takes 6 hours.
Depending on your task, multiprocessing may or may not take longer.
If you want to take advantages of your CPU cores and speed up your filtering process then you should use multiprocessing.Pool
offers a convenient means of parallelizing the execution of a function
across multiple input values, distributing the input data across
processes (data parallelism).
I've been creating an example of data filtering and then I've been measuring the timing of a simple approach and the timing of a multiprocess approach. (starting from your code)
# take only the sentences that ends in "we are what we dream", the second word is "are"
import time
from multiprocessing.pool import Pool
LEN_FILTER_SENTENCE = len('we are what we dream')
num_process = 10
def do_loop(sentences):
l = []
for sentence in sentences:
if sentence[-LEN_FILTER_SENTENCE:].lower() =='we are what we doing' and sentence.split()[1] == 'are':
l.append(sentence)
return l
#with thredding
def thread_work(sentences):
#multiprocessing
pool = Pool(processes=num_process)
pool_food = (sentences[i: i + num_process] for i in range(0, len(sentences), num_process))
result = pool.map(do_loop, pool_food)
return result
def test(data_size=5, sentence_size=100):
to_be_filtered = ['we are what we doing'*sentence_size] * 10 ** data_size + ['we are what we dream'*sentence_size] * 10 ** data_size
start_time = time.time()
c = do_loop(to_be_filtered)
simple_time = (time.time() - start_time)
start_time = time.time()
ll = [e for l in thread_work(to_be_filtered) for e in l]
multiprocessing_time = (time.time() - start_time)
assert c == ll
return simple_time, multiprocessing_time
data_size represents the length of your data and sentence_size is a factor of multiplication for each data element, you can see that sentence_size is directly proportional with the number of CPU operations requested for each item from your data.
data_size = [1, 2, 3, 4, 5, 6]
results = {i: {'simple_time': [], 'multiprocessing_time': []} for i in data_size}
sentence_size = list(range(1, 500, 100))
for size in data_size:
for s_size in sentence_size:
simple_time, multiprocessing_time = test(size, s_size)
results[size]['simple_time'].append(simple_time)
results[size]['multiprocessing_time'].append(multiprocessing_time)
import pandas as pd
df_small_data = pd.DataFrame({'simple_data_size_1': results[1]['simple_time'],
'simple_data_size_2': results[2]['simple_time'],
'simple_data_size_3': results[3]['simple_time'],
'multiprocessing_data_size_1': results[1]['multiprocessing_time'],
'multiprocessing_data_size_2': results[2]['multiprocessing_time'],
'multiprocessing_data_size_3': results[3]['multiprocessing_time'],
'sentence_size': sentence_size})
df_big_data = pd.DataFrame({'simple_data_size_4': results[4]['simple_time'],
'simple_data_size_5': results[5]['simple_time'],
'simple_data_size_6': results[6]['simple_time'],
'multiprocessing_data_size_4': results[4]['multiprocessing_time'],
'multiprocessing_data_size_5': results[5]['multiprocessing_time'],
'multiprocessing_data_size_6': results[6]['multiprocessing_time'],
'sentence_size': sentence_size})
Ploting the timing for small data:
ax = df_small_data.set_index('sentence_size').plot(figsize=(20, 10), title = 'Simple vs multiprocessing approach for small data')
ax.set_ylabel('Time in seconds')
Ploting the timing for big data(relative big data):
As you can see, the multiprocessing power is revealing when you have big data that requires relatively significant CPU power for each data element.
The task here is so small that the parallelization overhead hugely dominates over the benefits. This is a common FAQ.
It's better you use multiprocessing.Process(), as python has Global Interpreter Lock(GIL). So even if you make threads to increase the speed of your tasks, it won't increase, it'll go one by one. You can refer python doc for GIL and threading.
Aroosh Rana may have the best answer, but when testing using that approach a couple things to watch out for. The way you grow your array in the loop may be very inefficient, instead consider allocating its full size up front. Also, look closely at the way you divided up the work, you have two loops that process half the array and one that goes from n/2 to n/2. Also as mentioned elsewhere the word done is rather trivial and wouldn't benefit from parallel processing.
I've tried to improve upon your previous test.
import time
from multiprocessing.pool import ThreadPool
import math
def do_loop(array, i,j):
for k in range(i,j):
array[k] = math.cos(1/(1+k))
return array
#loop veriable
x = 7
array_size = 2*10**x
#without thredding
start_time = time.time()
array = [0]*array_size
c = do_loop(array, 0,array_size)
print("--- %s seconds ---" % (time.time() - start_time))
#with thredding
def thread_work(n):
#dividing loop size
array = [0]*n
a = 0
b = int(n/3)
c = int(2*n/3)
#multiprocessing
pool = ThreadPool(processes=4)
async_result1 = pool.apply_async(do_loop, (array, a,b))
async_result2 = pool.apply_async(do_loop, (array, b,c))
async_result3 = pool.apply_async(do_loop, (array, c,n))
#get the result from all processes]
result1 = async_result1.get()
result2 = async_result2.get()
result3 = async_result3.get()
start_time = time.time()
result = result1+result2+result3
print("--- %s seconds ---" % (time.time() - start_time))
return result
start_time = time.time()
ll = thread_work(array_size)
print("--- %s seconds ---" % (time.time() - start_time))
Also keep in mind that with an approach like this you wouldn't have to combine the results at the end as each thread would be processing on the same array.
Why do you use multiprocessing for threads?
Best is to create multiple instances of threads. Give each of them their tasks. Finally, start all of them. And wait until they finish. In the meantime, collect results to some list.
From my experience (for one particular task), I have found that even creating a whole graph of threads at the beginning gives smaller overhead than directly before starting tasks in next node in graph. I mean it for 10, 100, 1000, 10000 threads. Just make sure that threads are sleeping during idle time i.e. time.sleep(0.5) to avoid wasting cycles of CPU.
With threads, you can use lists, dictionaries, and queues, which are thread-safe.
i was playing with multiprocessing in python. I'm trying do distribute calculations on arrays to multiple CPU cores. In order to do that I'm forking as many processes as multiprocessing.cpu_count() returns and I'm passing subsets of the array to the processes (by partitioning the array indices). The array is operated on as a shared memory object.
However, for varying array sizes I cannot experience any runtime improvements. Why is that?
This is just a toy example, I'm not trying to achieve something with this calculations.
import multiprocessing as mp
import numpy as np
import time
import sharedmem
def some_function_mult(q, arr, index, width):
q.put((sum(arr[index:index+width])/np.amax(arr[index:index+width])**2)/40)
def some_function(arr, index, width):
return sum((arr[index:index+width])/np.amax(arr[index:index+width])**2)/40
def main():
num = mp.cpu_count()
n = 200000000
width = n/num
random_array = np.random.randint(0,255,n)
shared = sharedmem.empty(n)
shared[:] = random_array
print (shared)
queue = mp.Queue()
processes = [mp.Process(target=some_function_mult, args=(queue, shared, i*width, width)) for i in xrange(num)]
start_time = time.time()
for p in processes:
p.start()
result = []
for p in processes:
result.append(queue.get())
for p in processes:
p.join()
end_time = time.time()
print ('Multiprocessing execution time = ' + str(end_time-start_time))
print (result)
result = []
start_time =time.time()
for i in range(num):
result.append(some_function(random_array, i*width, width))
end_time = time.time()
print ('Sequential processing time = ' + str(end_time-start_time))
print (result)
if __name__ == '__main__':
main()