I have a code like this
x = 3;
y = 3;
z = 10;
ar = np.zeros((x,y,z))
from multiprocessing import Process, Pool
para = []
process = []
def local_func(section):
print "section %s" % str(section)
ar[2,2,section] = 255
print "value set %d", ar[2,2,section]
pool = Pool(1)
run_list = range(0,10)
list_of_results = pool.map(local_func, run_list)
print ar
The value in ar was not changed with multithreading, what might be wrong?
thanks
You're using multiple processes here, not multiple threads. Because of that, each instance of local_func gets its own separate copy of ar. You can use a custom Manager to create a shared numpy array, which you can pass to each child process and get the results you expect:
import numpy as np
from functools import partial
from multiprocessing import Process, Pool
import multiprocessing.managers
x = 3;
y = 3;
z = 10;
class MyManager(multiprocessing.managers.BaseManager):
pass
MyManager.register('np_zeros', np.zeros, multiprocessing.managers.ArrayProxy)
para = []
process = []
def local_func(ar, section):
print "section %s" % str(section)
ar[2,2,section] = 255
print "value set %d", ar[2,2,section]
if __name__ == "__main__":
m = MyManager()
m.start()
ar = m.np_zeros((x,y,z))
pool = Pool(1)
run_list = range(0,10)
func = partial(local_func, ar)
list_of_results = pool.map(func, run_list)
print ar
Well, multi-threading and multi-processing are different things.
With multi-threading threads share access to the same array.
With multi-processing each process has its own copy of the array.
multiprocessing.Pool is a process pool, not a thread pool.
If you want thread pool, use multiprocess.pool.ThreadPool:
Replace:
from multiprocessing import Pool
with:
from multiprocessing.pool import ThreadPool as Pool
Related
Is it possible to have progress bar with map_async from multiprocessing:
toy example:
from multiprocessing import Pool
import tqdm
def f(x):
print(x)
return x*x
n_job = 4
with Pool(processes=n_job) as pool:
results = pool.map_async(f, range(10)).get()
print(results)
something like this:
data = []
with Pool(processes=10) as pool:
for d in tqdm.tqdm(
pool.imap(f, range(10)),
total=10):
data.append(d)
There are a couple of ways of achieving what you want that I can think of:
Use apply_async with a callback argument to update the progress bar as each result becomes available.
Use imap and as you iterate the results you can update the progress bar.
There is a slight problem with imap in that the results must be returned in task-submission order, which is of course what you want. But that order does not necessarily reflect the order in which the submitted tasks complete so the progress bar is not necessarily getting updated as frequently as it otherwise might. But I will show that solution first since it is the simplest and probably adequate:
from multiprocessing import Pool
import tqdm
def f(x):
import time
time.sleep(1) # for demo purposes
return x*x
# Required by Windows:
if __name__ == '__main__':
pool_size = 4
results = []
with Pool(processes=pool_size) as pool:
with tqdm.tqdm(total=10) as pbar:
for result in pool.imap(f, range(10)):
results.append(result)
pbar.update()
print(results)
The solution that uses apply_async:
from multiprocessing import Pool
import tqdm
def f(x):
import time
time.sleep(1) # for demo purposes
return x*x
# Required by Windows:
if __name__ == '__main__':
def my_callback(_):
# We don't care about the actual result.
# Just update the progress bar:
pbar.update()
pool_size = 4
with Pool(processes=pool_size) as pool:
with tqdm.tqdm(total=10) as pbar:
async_results = [pool.apply_async(f, args=(x,), callback=my_callback) for x in range(10)]
results = [async_result.get() for async_result in async_results]
print(results)
I think this is it:
from multiprocessing import Pool
import tqdm
def f(x):
return x*x
n_job = 4
data = []
with Pool(processes=10) as pool:
for d in tqdm.tqdm(
pool.map_async(f, range(10)).get(),
total=10):
data.append(d)
print(data)
I can't seem to figure out why my results are not appending while using the multiprocessing package.
I've looked at many similar questions but can't seem to figure out what I'm doing wrong. This my first attempt at multiprocessing (as you might be able to tell) so I don't quite understand all the jargon in the documentation which might be part of the problem
Running this in PyCharm prints an empty list instead of the desired list of row sums.
import numpy as np
from multiprocessing import Pool
import timeit
data = np.random.randint(0, 100, size=(5, 1000))
def add_these(numbers_to_add):
added = np.sum(numbers_to_add)
return added
results = []
tic = timeit.default_timer() # start timer
pool = Pool(3)
if __name__ == '__main__':
for row in data:
pool.apply_async(add_these, row, callback=results.append)
toc = timeit.default_timer() # start timer
print(toc - tic)
print(results)
EDIT: Closing and joining pool, then printing results within the if name==main block results in the following error being raised repeatedly until I manually stop execution:
RuntimeError:
An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.
Code to reproduce error:
import numpy as np
from multiprocessing import Pool, freeze_support
import timeit
data = np.random.randint(0, 100, size=(5, 1000))
def add_these(numbers_to_add):
added = np.sum(numbers_to_add)
return added
results = []
tic = timeit.default_timer() # start timer
pool = Pool(3)
if __name__ == '__main__':
for row in data:
pool.apply_async(add_these, (row,), callback=results.append)
pool.close()
pool.join()
print(results)
toc = timeit.default_timer() # end timer
print(toc - tic)
I think this would be a more correct way:
import numpy as np
from multiprocessing import Pool
import timeit
data = np.random.randint(0, 100, size=(5, 1000))
def add_these(numbers_to_add):
added = np.sum(numbers_to_add)
return added
results = []
if __name__ == '__main__':
with Pool(processes=3) as pool:
for row in data:
results = pool.apply_async(add_these, (row,))
try:
print(results.get(timeout=1))
except TimeoutError:
print("Multiprocessing Timeout")
I would like to know. How can I determine that values in shared memory were changed? For example. I have this code.
import array
import time
from multiprocessing import Process, Queue, shared_memory, Manager
def process_1():
buffer = shared_memory.ShareableList(range(5), name='AAA')
time.sleep(1)
buffer[1] = 100
def process_2():
buffer = shared_memory.ShareableList(name='AAA')
print(buffer[1])
if __name__ == '__main__':
p_1 = Process(target=process_1)
p_1.start()
p_2 = Process(target=process_2)
p_2.start()
p_1.join()
How process_2 can determine that process_1 change value in buffer.
I want to know if is there a way to make multiprocessing working in this code. What should I change or if there exist other function in multiprocessing that will allow me to do that operation.
You can call the locateOnScreen('calc7key.png') function to get the screen coordinates. The return value is a 4-integer tuple: (left, top, width, height).
I got error:
checkNumber1 = resourceBlankLightTemp[1]
TypeError: 'Process' object does not support indexing
import pyautogui, time, os, logging, sys, random, copy
import multiprocessing as mp
BLANK_DARK = os.path.join('images', 'blankDark.png')
BLANK_LIGHT = os.path.join('images', 'blankLight.png')
def blankFirstDarkResourcesIconPosition():
blankDarkIcon = pyautogui.locateOnScreen(BLANK_DARK)
return blankDarkIcon
def blankFirstLightResourcesIconPosition():
blankLightIcon = pyautogui.locateOnScreen(BLANK_LIGHT)
return blankLightIcon
def getRegionOfResourceImage():
global resourceIconRegion
resourceBlankLightTemp = mp.Process(target = blankFirstLightResourcesIconPosition)
resourceBlankDarkTemp = mp.Process(target = blankFirstDarkResourcesIconPosition)
resourceBlankLightTemp.start()
resourceBlankDarkTemp.start()
if(resourceBlankLightTemp == None):
checkNumber1 = 2000
else:
checkNumber1 = resourceBlankLightTemp[1]
if(resourceBlankDarkTemp == None):
checkNumber2 = 2000
else:
checkNumber2 = resourceBlankDarkTemp[1]
In general, if you just want to use multiprocessing to run existing CPU-intensive functions in parallel, it is easiest to do through a Pool, as shown in the example at the beginning of the documentation:
# ...
def getRegionOfResourceImage():
global resourceIconRegion
with mp.Pool(2) as p:
resourceBlankLightTemp, resourceBlankDarkTemp = p.map(
lambda x: x(), [blankFirstLightResourcesIconPosition,
blankFirstDarkResourcesIconPosition])
if(resourceBlankLightTemp == None):
# ...
I'm writing some bogus practice code so I can implement the ideas once I have a better idea of what I'm doing. The code is designed for multiprocessing in order to reduce the runtime by splitting the stack of three arrays into several pieces horizontally that are then executed in parallel via map_async. However, the code seems to hang at the first .recv() method of a pipe, even though the Pipe() object is fully defined, and I'm not sure why it's doing it. If I manually define each Pipe() object then the code works just fine, but as soon as I iterate the process the code hangs at sec1 = pipes[0][0].recv(). How can I fix this?
from multiprocessing import Process, Pipe
import multiprocessing as mp
import numpy as np
import math
num_sections = 4
pipes_send = [None]*num_sections
pipes_recv = [None]*num_sections
pipes = zip(pipes_recv,pipes_send)
for i in range(num_sections):
pipes[i] = list(pipes[i])
for i in range(num_sections):
pipes[i][0],pipes[i][1] = Pipe()
def f(sec_num):
for plane in range(3):
hist_sec[sec_num][plane] += rand_sec[sec_num][plane]
if sec_num == 0:
pipes[0][1].send(hist_sec[sec_num])
pipes[0][1].close()
if sec_num == 1:
pipes[1][1].send(hist_sec[sec_num])
pipes[1][1].close()
if sec_num == 2:
pipes[2][1].send(hist_sec[sec_num])
pipes[2][1].close()
if sec_num == 3:
pipes[3][1].send(hist_sec[sec_num])
pipes[3][1].close()
hist = np.zeros((3,512,512))
hist_sec = []
randmat = np.random.rand(3,512,512)
rand_sec = []
for plane in range(3):
hist_div = np.array_split(hist[plane], num_sections)
hist_sec.append(hist_div)
randmatsplit = np.array_split(randmat[plane], num_sections)
rand_sec.append(randmatsplit)
hist_sec = np.rollaxis(np.asarray(hist_sec),1,0)
rand_sec = np.rollaxis(np.asarray(rand_sec),1,0)
if __name__ == '__main__':
pool = mp.Pool(num_sections)
args = np.arange(num_sections)
pool.map_async(f, args, chunksize=1)
sec1 = pipes[0][0].recv()
sec2 = pipes[1][0].recv()
sec3 = pipes[2][0].recv()
sec4 = pipes[3][0].recv()
for plane in range(3):
hist_plane = np.concatenate((sec1[plane],sec2[plane],sec3[plane],sec4[plane]),axis=0)
hist_full.append(hist_plane)
pool.close()
pool.join()