Multiprocessing with numpy makes Python quit unexpectedly on OSX

Multiprocessing with numpy makes Python quit unexpectedly on OSX - python

I've run into a problem, where Python quits unexpectedly, when running multiprocessing with numpy. I've isolated the problem, so that I can now confirm that the multiprocessing works perfect when running the code stated below:
import numpy as np
from multiprocessing import Pool, Process
import time
import cPickle as p
def test(args):
x,i = args
if i == 2:
time.sleep(4)
arr = np.dot(x.T,x)
print i
if __name__ == '__main__':
x = np.random.random(size=((2000,500)))
evaluations = [(x,i) for i in range(5)]
p = Pool()
p.map_async(test,evaluations)
p.close()
p.join()
The problem occurs when I try to evaluate the code below. This makes Python quit unexpectedly:
import numpy as np
from multiprocessing import Pool, Process
import time
import cPickle as p
def test(args):
x,i = args
if i == 2:
time.sleep(4)
arr = np.dot(x.T,x)
print i
if __name__ == '__main__':
x = np.random.random(size=((2000,500)))
test((x,4)) # Added code
evaluations = [(x,i) for i in range(5)]
p = Pool()
p.map_async(test,evaluations)
p.close()
p.join()
Please help someone. I'm open to all suggestions. Thanks. Note: I have tried two different machines and the same problem occurs.

This is a known issue with multiprocessing and numpy on MacOS X, and a bit of a duplicate of:
segfault using numpy's lapack_lite with multiprocessing on osx, not linux
http://mail.scipy.org/pipermail/numpy-discussion/2012-August/063589.html
The answer seems to be to use a different BLAS other than the Apple accelerate framework when linking Numpy... unfortunate :(

I figured out a workaround to the problem. The problem occurs when Numpy is used together with BLAS before initializing a multiprocessing instance. My workaround is simply to put the Numpy code (running BLAS) into a single process and then running the multiprocessing instances afterwards. This is not a good coding style, but it works. See example below:
Following will fail - Python will quit:
import numpy as np
from multiprocessing import Pool, Process
def test(x):
arr = np.dot(x.T,x) # On large matrices, this calc will use BLAS.
if __name__ == '__main__':
x = np.random.random(size=((2000,500))) # Random matrix
test(x)
evaluations = [x for _ in range(5)]
p = Pool()
p.map_async(test,evaluations) # This is where Python will quit, because of the prior use of BLAS.
p.close()
p.join()
Following will succeed:
import numpy as np
from multiprocessing import Pool, Process
def test(x):
arr = np.dot(x.T,x) # On large matrices, this calc will use BLAS.
if __name__ == '__main__':
x = np.random.random(size=((2000,500))) # Random matrix
p = Process(target = test,args = (x,))
p.start()
p.join()
evaluations = [x for _ in range(5)]
p = Pool()
p.map_async(test,evaluations)
p.close()
p.join()

Related

Multiprocessing in Python: Parallelize a for loop to fill a Numpy array

I've been reading threads like this one but any of them seems to work for my case. I'm trying to parallelize the following toy example to fill a Numpy array inside a for loop using Multiprocessing in Python:
import numpy as np
from multiprocessing import Pool
import time
def func1(x, y=1):
return x**2 + y
def func2(n, parallel=False):
my_array = np.zeros((n))
# Parallelized version:
if parallel:
pool = Pool(processes=6)
for idx, val in enumerate(range(1, n+1)):
result = pool.apply_async(func1, [val])
my_array[idx] = result.get()
pool.close()
# Not parallelized version:
else:
for i in range(1, n+1):
my_array[i-1] = func1(i)
return my_array
def main():
start = time.time()
my_array = func2(60000)
end = time.time()
print(my_array)
print("Normal time: {}\n".format(end-start))
start_parallel = time.time()
my_array_parallelized = func2(60000, parallel=True)
end_parallel = time.time()
print(my_array_parallelized)
print("Time based on multiprocessing: {}".format(end_parallel-start_parallel))
if __name__ == '__main__':
main()
The lines in the code based on Multiprocessing seem to work and give you the right results. However, it takes far longer than the non parallelized version. Here is the output of both versions:
[2.00000e+00 5.00000e+00 1.00000e+01 ... 3.59976e+09 3.59988e+09
3.60000e+09]
Normal time: 0.01605963706970215
[2.00000e+00 5.00000e+00 1.00000e+01 ... 3.59976e+09 3.59988e+09
3.60000e+09]
Time based on multiprocessing: 2.8775112628936768
My intuition tells me that it should be a better way of capturing results from pool.apply_async(). What am I doing wrong? What is the most efficient way to accomplish this? Thx.

Creating processes is expensive. On my machine it take at leas several hundred of microsecond per process created. Moreover, the multiprocessing module copy the data to be computed between process and then gather the results from the process pool. This inter-process communication is very slow too. The problem is that your computation is trivial and can be done very quickly, likely much faster than all the introduced overhead. The multiprocessing module is only useful when you are dealing with quite small datasets and perform intensive computation (compared to the amount of computed data).
Hopefully, when it comes to numericals computations using Numpy, there is a simple and fast way to parallelize your application: the Numba JIT. Numba can parallelize a code if you explicitly use parallel structures (parallel=True and prange). It uses threads and not heavy processes that are working in shared memory. Numba can overcome the GIL if your code does not deal with native types and Numpy arrays instead of pure Python dynamic object (lists, big integers, classes, etc.). Here is an example:
import numpy as np
import numba as nb
import time
#nb.njit
def func1(x, y=1):
return x**2 + y
#nb.njit('float64[:](int64)', parallel=True)
def func2(n):
my_array = np.zeros(n)
for i in nb.prange(1, n+1):
my_array[i-1] = func1(i)
return my_array
def main():
start = time.time()
my_array = func2(60000)
end = time.time()
print(my_array)
print("Numba time: {}\n".format(end-start))
if __name__ == '__main__':
main()
Because Numba compiles the code at runtime, it is able to fully optimize the loop to a no-op resulting in a time close to 0 second in this case.

Here is the solution proposed by #thisisalsomypassword that improves my initial proposal. That is, "collecting the AsyncResult objects in a list within the loop and then calling AsyncResult.get() after all processes have started on each result object":
import numpy as np
from multiprocessing import Pool
import time
def func1(x, y=1):
time.sleep(0.1)
return x**2 + y
def func2(n, parallel=False):
my_array = np.zeros((n))
# Parallelized version:
if parallel:
pool = Pool(processes=6)
####### HERE COMES THE CHANGE #######
results = [pool.apply_async(func1, [val]) for val in range(1, n+1)]
for idx, val in enumerate(results):
my_array[idx] = val.get()
#######
pool.close()
# Not parallelized version:
else:
for i in range(1, n+1):
my_array[i-1] = func1(i)
return my_array
def main():
start = time.time()
my_array = func2(600)
end = time.time()
print(my_array)
print("Normal time: {}\n".format(end-start))
start_parallel = time.time()
my_array_parallelized = func2(600, parallel=True)
end_parallel = time.time()
print(my_array_parallelized)
print("Time based on multiprocessing: {}".format(end_parallel-start_parallel))
if __name__ == '__main__':
main()
Now it works. Time is reduced considerably with Multiprocessing:
Normal time: 60.107836008071
Time based on multiprocessing: 10.049324989318848
time.sleep(0.1) was added in func1 to cancel out the effect of being a super trivial task.

Calculate mean in Monte Carlo Simulation using python multiprocessing

I have been reading about multiprocessing in Python (e.g. I have read this and this and this and this and so on; I have also read/watched different websites/videos such as this and this and this and so many more!) but I am still confused how I could apply multiprocessing to my specific problem. I have written a simple example code for calculating the avg value of randomly generated integers using Monte Carlo Simulation (I store the random integers in a variable called integers so I can finally calculate the mean; I am also generating random numpy.ndarrays and store them in a variable called arrays as I need to do some post-processing on those arrays later too):
import numpy as np
nMCS = 10 ** 8
integers = []
arrays = []
for i in range(nMCS):
a = np.random.randint(0,10)
b = np.random.rand(10,2)
integers.append(a)
arrays.append(b)
mean_val = np.average(integers)
# I will do post-processing on 'arrays' later!!
Now I want to utilize all of the 16 cores on my machine, so the random numbers/arrays are not generated in sequence and I can speed up the process. Based on what I have learnt, I recognize I need to store the results of each Monte Carlo Simulation (i.e. the generated random integer and random numpy.ndarray) and then use Inter-process communication in order to later store all of the results in a list. I have written different codes but unfortunately non of them work. As an example, when I write something like this:
import numpy as np
import multiprocessing
nMCS = 10 ** 6
integers = []
arrays = []
def monte_carlo():
a = np.random.randint(0,10)
b = np.random.rand(10,2)
if __name__ == '__main__':
__spec__ = "ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>)" # this is because I am using Spyder!
p1 = multiprocessing.Process(target = monte_carlo)
p1.start()
p1.join()
for i in range(nMCS):
integers.append(a)
arrays.append(b)
I get the error "name 'a' is not defined". So could anyone please help me with this and tell me how I could generate as many random integers/arrays as possible concurrently, and then add them all to a list for further processing?

Due to the fact that returning a lot of result causes time for propagation between process, I would suggest to divide the task in few part and process it before returning back.
n = 4
def monte_carlo():
raw_result = []
for j in range(10**4 / n):
a = np.random.randint(0,10)
b = np.random.rand(10,2)
raw_result .append([a,b])
result = processResult(raw_result)
#Your method to reduce the result return,
#let's assume the return value is [avg(a),reformed_array(b)]
return result
if __name__ == '__main__':
__spec__ = "ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>)" # this is because I am using Spyder!
pool = Pool(processes=4)
#you can control how many processes here, for example multiprocessing.cpu_count()-1 to avoid completely blocking
multiple_results = [pool.apply_async(monte_carlo, (i,)) for i in range(n)]
data = [res.get() for res in multiple_results]
#OR
data = pool.map(monte_carlo, [i for i in range(n)])
#Both return you a list of [avg(a),reformed_array(b)]

Simple error.
a and b are created in your function
They do not exist in your main scope. you will need to return them back from your function
def monte_carlo():
a = np.random.randint(0,10)
b = np.random.rand(10,2)
#create a return statement here. It may help if you put them into an array so you can return 2 value
if __name__ == '__main__':
__spec__ = "ModuleSpec(name='builtins', loader=<class
'_frozen_importlib.BuiltinImporter'>)" # this is because I am using Spyder!
p1 = multiprocessing.Process(target = monte_carlo)
p1.start()
p1.join()
#Call your function here and save the return to something
for i in range(nMCS):
integers.append(a) # paste here
arrays.append(b) # and here
Edit: tested code and found you were never calling your monte_carlo function. a and b are now working correctly but you have a new error to try and solve. Sorry but I wont be able to help with this error as I dont understand it myself, but here is my edit of your code.
import numpy as np
import multiprocessing
nMCS = 10 ** 6
integers = []
arrays = []
def monte_carlo():
a = np.random.randint(0,10)
b = np.random.rand(10,2)
temp = [a,b]
return temp
if __name__ == '__main__':
__spec__ = "ModuleSpec(name='builtins', loader=<class
'_frozen_importlib.BuiltinImporter'>)" # this is because I am using Spyder!
p1 = multiprocessing.Process(target = monte_carlo())#added the extra brackets here
p1.start()
p1.join()
for i in range(nMCS):
array = monte_carlo()
integers.append(array[0])
arrays.append(array[1])
and here is the error I got with this edit. I am still learning multi processing myself so other people may be better suited to help with this
Process Process-6:
Traceback (most recent call last):
File"c:\users\lunar\appdata\local\continuum\anaconda3\lib\multiprocessing\process.py", line 252, in _bootstrap
self.run()
File "c:\users\lunar\appdata\local\continuum\anaconda3\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
TypeError: 'list' object is not callable

Python multiprocessing with Pool - the main process takes forever

I am trying to understand how multiprocessing works with Python. Here's my test code:
import numpy as np
import multiprocessing
import time
def worker(a):
for i in range(len(a)):
for j in arr2:
a[i] = a[i]*j
return len(a)
arr2 = np.random.rand(10000).tolist()
if __name__ == '__main__':
multiprocessing.freeze_support()
cores = multiprocessing.cpu_count()
arr1 = np.random.rand(1000000).tolist()
tmp = time.time()
pool = multiprocessing.Pool(processes=cores)
result = pool.map(worker, [arr1], chunksize=1000000/(cores-1))
print "mp time", time.time()-tmp
I have 8 cores. It usually ends up with 7 processes using only ~3% of the CPU for about a second, and the last process uses ~1/8 of the CPU for forever...(it has been running for about 15 minutes)
I understand that the interprocess communication usually bounds the complexity of parallel programming, but does it usually take this long? What else could cause the last process to take forever?
This thread: Python multiprocessing never joins seems to address a similar issue but it doesn't solve the problem with Pool.

It looks like you want to divide the work into chunks. You can use the range function to partition the data. On Linux, forked processes get a copy-on-write view of the parent memory so you can just pass down the indexes you want to work on. On Windows, no such luck. You need to pass in each sublist. This program should do it
import numpy as np
import multiprocessing
import time
import platform
def worker(a):
if platform.system() == "Linux":
# on linux we passed in start:len
start, length = a
a = arr1[start:length]
for i in range(len(a)):
for j in arr2:
a[i] = a[i]*j
return len(a)
arr2 = np.random.rand(10000).tolist()
if __name__ == '__main__':
multiprocessing.freeze_support()
cores = multiprocessing.cpu_count()
arr1 = np.random.rand(1000000).tolist()
tmp = time.time()
pool = multiprocessing.Pool(processes=cores)
chunk = (len(arr1)+cores-1)//cores
# on Windows, pass the sublist, on linux just the indexes and let the
# worker split from the view of parent memory space
if platform.system() == "Linux":
seq = [(i, i+chunk) for i in range(0, len(arr1), chunk)]
else:
seq = [arr1[i:i+chunk] for i in range(0, len(arr1), chunk)]
result = pool.map(worker, seq, chunksize=1)
print "mp time", time.time()-tmp

You point is here:
pool.map will automatically iterate the object which is [arr1] in your program. Please notice that the object is [arr1] but not arr1, that means the length of object you pass to pool.map is only one.
I think the simplest solution is replace [arr1] with arr1.

Python multiprocessing and shared numpy array

I have a problem, which is similar to this:
import numpy as np
C = np.zeros((100,10))
for i in range(10):
C_sub = get_sub_matrix_C(i, other_args) # shape 10x10
C[i*10:(i+1)*10,:10] = C_sub
So, apparently there is no need to run this as a serial calculation, since each submatrix can be calculated independently.
I would like to use the multiprocessing module and create up to 4 processes for the for loop.
I read some tutorials about multiprocessing, but wasn't able to figure out how to use this to solve my problem.
Thanks for your help

A simple way to parallelize that code would be to use a Pool of processes:
pool = multiprocessing.Pool()
results = pool.starmap(get_sub_matrix_C, ((i, other_args) for i in range(10)))
for i, res in enumerate(results):
C[i*10:(i+1)*10,:10] = res
I've used starmap since the get_sub_matrix_C function has more than one argument (starmap(f, [(x1, ..., xN)]) calls f(x1, ..., xN)).
Note however that serialization/deserialization may take significant time and space, so you may have to use a more low-level solution to avoid that overhead.
It looks like you are running an outdated version of python. You can replace starmap with plain map but then you have to provide a function that takes a single parameter:
def f(args):
return get_sub_matrix_C(*args)
pool = multiprocessing.Pool()
results = pool.map(f, ((i, other_args) for i in range(10)))
for i, res in enumerate(results):
C[i*10:(i+1)*10,:10] = res

The following recipe perhaps can do the job. Feel free to ask.
import numpy as np
import multiprocessing
def processParallel():
def own_process(i, other_args, out_queue):
C_sub = get_sub_matrix_C(i, other_args)
out_queue.put(C_sub)
sub_matrices_list = []
out_queue = multiprocessing.Queue()
other_args = 0
for i in range(10):
p = multiprocessing.Process(
target=own_process,
args=(i, other_args, out_queue))
procs.append(p)
p.start()
for i in range(10):
sub_matrices_list.extend(out_queue.get())
for p in procs:
p.join()
return sub_matrices_list
C = np.zeros((100,10))
result = processParallel()
for i in range(10):
C[i*10:(i+1)*10,:10] = result[i]

numpy and multiprocessing, how it work [duplicate]

I would like to use a numpy array in shared memory for use with the multiprocessing module. The difficulty is using it like a numpy array, and not just as a ctypes array.
from multiprocessing import Process, Array
import scipy
def f(a):
a[0] = -a[0]
if __name__ == '__main__':
# Create the array
N = int(10)
unshared_arr = scipy.rand(N)
arr = Array('d', unshared_arr)
print "Originally, the first two elements of arr = %s"%(arr[:2])
# Create, start, and finish the child processes
p = Process(target=f, args=(arr,))
p.start()
p.join()
# Printing out the changed values
print "Now, the first two elements of arr = %s"%arr[:2]
This produces output such as:
Originally, the first two elements of arr = [0.3518653236697369, 0.517794725524976]
Now, the first two elements of arr = [-0.3518653236697369, 0.517794725524976]
The array can be accessed in a ctypes manner, e.g. arr[i] makes sense. However, it is not a numpy array, and I cannot perform operations such as -1*arr, or arr.sum(). I suppose a solution would be to convert the ctypes array into a numpy array. However (besides not being able to make this work), I don't believe it would be shared anymore.
It seems there would be a standard solution to what has to be a common problem.

To add to #unutbu's (not available anymore) and #Henry Gomersall's answers. You could use shared_arr.get_lock() to synchronize access when needed:
shared_arr = mp.Array(ctypes.c_double, N)
# ...
def f(i): # could be anything numpy accepts as an index such another numpy array
with shared_arr.get_lock(): # synchronize access
arr = np.frombuffer(shared_arr.get_obj()) # no data copying
arr[i] = -arr[i]
Example
import ctypes
import logging
import multiprocessing as mp
from contextlib import closing
import numpy as np
info = mp.get_logger().info
def main():
logger = mp.log_to_stderr()
logger.setLevel(logging.INFO)
# create shared array
N, M = 100, 11
shared_arr = mp.Array(ctypes.c_double, N)
arr = tonumpyarray(shared_arr)
# fill with random values
arr[:] = np.random.uniform(size=N)
arr_orig = arr.copy()
# write to arr from different processes
with closing(mp.Pool(initializer=init, initargs=(shared_arr,))) as p:
# many processes access the same slice
stop_f = N // 10
p.map_async(f, [slice(stop_f)]*M)
# many processes access different slices of the same array
assert M % 2 # odd
step = N // 10
p.map_async(g, [slice(i, i + step) for i in range(stop_f, N, step)])
p.join()
assert np.allclose(((-1)**M)*tonumpyarray(shared_arr), arr_orig)
def init(shared_arr_):
global shared_arr
shared_arr = shared_arr_ # must be inherited, not passed as an argument
def tonumpyarray(mp_arr):
return np.frombuffer(mp_arr.get_obj())
def f(i):
"""synchronized."""
with shared_arr.get_lock(): # synchronize access
g(i)
def g(i):
"""no synchronization."""
info("start %s" % (i,))
arr = tonumpyarray(shared_arr)
arr[i] = -1 * arr[i]
info("end %s" % (i,))
if __name__ == '__main__':
mp.freeze_support()
main()
If you don't need synchronized access or you create your own locks then mp.Array() is unnecessary. You could use mp.sharedctypes.RawArray in this case.

The Array object has a get_obj() method associated with it, which returns the ctypes array which presents a buffer interface. I think the following should work...
from multiprocessing import Process, Array
import scipy
import numpy
def f(a):
a[0] = -a[0]
if __name__ == '__main__':
# Create the array
N = int(10)
unshared_arr = scipy.rand(N)
a = Array('d', unshared_arr)
print "Originally, the first two elements of arr = %s"%(a[:2])
# Create, start, and finish the child process
p = Process(target=f, args=(a,))
p.start()
p.join()
# Print out the changed values
print "Now, the first two elements of arr = %s"%a[:2]
b = numpy.frombuffer(a.get_obj())
b[0] = 10.0
print a[0]
When run, this prints out the first element of a now being 10.0, showing a and b are just two views into the same memory.
In order to make sure it is still multiprocessor safe, I believe you will have to use the acquire and release methods that exist on the Array object, a, and its built in lock to make sure its all safely accessed (though I'm not an expert on the multiprocessor module).

While the answers already given are good, there is a much easier solution to this problem provided two conditions are met:
You are on a POSIX-compliant operating system (e.g. Linux, Mac OSX); and
Your child processes need read-only access to the shared array.
In this case you do not need to fiddle with explicitly making variables shared, as the child processes will be created using a fork. A forked child automatically shares the parent's memory space. In the context of Python multiprocessing, this means it shares all module-level variables; note that this does not hold for arguments that you explicitly pass to your child processes or to the functions you call on a multiprocessing.Pool or so.
A simple example:
import multiprocessing
import numpy as np
# will hold the (implicitly mem-shared) data
data_array = None
# child worker function
def job_handler(num):
# built-in id() returns unique memory ID of a variable
return id(data_array), np.sum(data_array)
def launch_jobs(data, num_jobs=5, num_worker=4):
global data_array
data_array = data
pool = multiprocessing.Pool(num_worker)
return pool.map(job_handler, range(num_jobs))
# create some random data and execute the child jobs
mem_ids, sumvals = zip(*launch_jobs(np.random.rand(10)))
# this will print 'True' on POSIX OS, since the data was shared
print(np.all(np.asarray(mem_ids) == id(data_array)))

I've written a small python module that uses POSIX shared memory to share numpy arrays between python interpreters. Maybe you will find it handy.
https://pypi.python.org/pypi/SharedArray
Here's how it works:
import numpy as np
import SharedArray as sa
# Create an array in shared memory
a = sa.create("test1", 10)
# Attach it as a different array. This can be done from another
# python interpreter as long as it runs on the same computer.
b = sa.attach("test1")
# See how they are actually sharing the same memory block
a[0] = 42
print(b[0])
# Destroying a does not affect b.
del a
print(b[0])
# See how "test1" is still present in shared memory even though we
# destroyed the array a.
sa.list()
# Now destroy the array "test1" from memory.
sa.delete("test1")
# The array b is not affected, but once you destroy it then the
# data are lost.
print(b[0])

You can use the sharedmem module: https://bitbucket.org/cleemesser/numpy-sharedmem
Here's your original code then, this time using shared memory that behaves like a NumPy array (note the additional last statement calling a NumPy sum() function):
from multiprocessing import Process
import sharedmem
import scipy
def f(a):
a[0] = -a[0]
if __name__ == '__main__':
# Create the array
N = int(10)
unshared_arr = scipy.rand(N)
arr = sharedmem.empty(N)
arr[:] = unshared_arr.copy()
print "Originally, the first two elements of arr = %s"%(arr[:2])
# Create, start, and finish the child process
p = Process(target=f, args=(arr,))
p.start()
p.join()
# Print out the changed values
print "Now, the first two elements of arr = %s"%arr[:2]
# Perform some NumPy operation
print arr.sum()

With Python3.8+ there is the multiprocessing.shared_memory standard library:
# np_sharing.py
from multiprocessing import Process
from multiprocessing.managers import SharedMemoryManager
from multiprocessing.shared_memory import SharedMemory
from typing import Tuple
import numpy as np
def create_np_array_from_shared_mem(
shared_mem: SharedMemory, shared_data_dtype: np.dtype, shared_data_shape: Tuple[int, ...]
) -> np.ndarray:
arr = np.frombuffer(shared_mem.buf, dtype=shared_data_dtype)
arr = arr.reshape(shared_data_shape)
return arr
def child_process(
shared_mem: SharedMemory, shared_data_dtype: np.dtype, shared_data_shape: Tuple[int, ...]
):
"""Logic to be executed by the child process"""
arr = create_np_array_from_shared_mem(shared_mem, shared_data_dtype, shared_data_shape)
arr[0, 0] = -arr[0, 0] # modify the array backed by shared memory
def main():
"""Logic to be executed by the parent process"""
# Data to be shared:
data_to_share = np.random.rand(10, 10)
SHARED_DATA_DTYPE = data_to_share.dtype
SHARED_DATA_SHAPE = data_to_share.shape
SHARED_DATA_NBYTES = data_to_share.nbytes
with SharedMemoryManager() as smm:
shared_mem = smm.SharedMemory(size=SHARED_DATA_NBYTES)
arr = create_np_array_from_shared_mem(shared_mem, SHARED_DATA_DTYPE, SHARED_DATA_SHAPE)
arr[:] = data_to_share # load the data into shared memory
print(f"The [0,0] element of arr is {arr[0,0]}") # before
# Run child process:
p = Process(target=child_process, args=(shared_mem, SHARED_DATA_DTYPE, SHARED_DATA_SHAPE))
p.start()
p.join()
print(f"The [0,0] element of arr is {arr[0,0]}") # after
del arr # delete np array so the shared memory can be deallocated
if __name__ == "__main__":
main()
Running the script:
$ python3.10 np_sharing.py
The [0,0] element of arr is 0.262091705529628
The [0,0] element of arr is -0.262091705529628
Since the arrays in different processes share the same underlying memory buffer, the standard caveats r.e. race conditions apply.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiprocessing with numpy makes Python quit unexpectedly on OSX - python

Related

Multiprocessing in Python: Parallelize a for loop to fill a Numpy array

Calculate mean in Monte Carlo Simulation using python multiprocessing

Python multiprocessing with Pool - the main process takes forever

Python multiprocessing and shared numpy array

numpy and multiprocessing, how it work [duplicate]

Categories

Resources