Shared Non-Contiguous-Access Numpy Array - python

I have a numpy array that I would like to share between a bunch of python processes in a way that doesn't involve copies. I create a shared numpy array from an existing numpy array using the sharedmem package.
import sharedmem as shm
def convert_to_shared_array(A):
shared_array = shm.shared_empty(A.shape, A.dtype, order="C")
shared_array[...] = A
return shared_array
My problem is that each subprocess needs to access rows that are randomly distributed in the array. Currently I create a shared numpy array using the sharedmem package and pass it to each subprocess. Each process also has a list, idx, of rows that it needs to access. The problem is in the subprocess when I do:
#idx = list of randomly distributed integers
local_array = shared_array[idx,:]
# Do stuff with local array
It creates a copy of the array instead of just another view. The array is quite large and manipulating it first before shareing it so that each process accesses a contiguous range of rows like
local_array = shared_array[start:stop,:]
takes too long.
Question: What are good solutions for sharing random access to a numpy array between python processes that don't involve copying the array?
The subprocesses need readonly access (so no need for locking on access).

Fancy indexing induces a copy, so you need to avoid fancy indexing if you want to avoid copies there is no way around it.

Related

Is there a Python package that generates SOA data structure from AOS?

I was working on increasing the performance of existing Python applications lately. What I found is that arranging data in arrays of basic data types (as a struct of arrays) instead of an array of structs/classes can increase performance. If data is saved in contiguous memory it also makes outsourcing heavy calculations to the GPU easier. My goal is to provide our users with a way to make use of SOA without having to know about numpy, numba, simd, etc.
Intel provides a template library and containers that can generate simd friendly data layouts from a struct/class.
https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/libraries/introduction-to-the-simd-data-layout-templates.html
This already gives an idea how something similar could be done in Python.
So, instead of reading objects from memory (that can be distributed somewhere in memory),
class A:
def __init__(self, x):
self.x = x
a_s = [A(1) for _ in range(10)]
for a in a_s:
a.x=2
I would like to have the data accessible as numpy array AND as instance of class A. So, that data can be accessed something like this:
sdlt_container = generate_SDLT(A(), 10)
a = sdlt_container[2] # returns instance of class A
a.x = 2 # returns a view to second element in numpy array and sets x[2]=2
sdlt_container.x[0:5] = 3 # Change x in several "instances of A"
Accessing data as an instance of class A might involve creating a new instance of A but the variables in this object should "point to" the correct index in the numpy array. I understand that optimizations like the Intel compiler does in a for loop are not possible in Python (interpreted vs compiled).
Thanks for any ideas!

faster copying of an object (like deepcopy)?

I have a class in python3 that contains a few variables and represents a state.
During the program (simulation) I need to make a big amount of copies of this state so that I can change it and still have the previous information.
The problem is that deepcopy from the copy module is too slow. Would I be better of creating a method in that class to copy an object, which would create and return a new object and copy the values for each variable? Note: inside the object there is a 3D list as well.
Is there any other solution to this? The deepcopy is really too slow, it takes more than 99% of the execution time according to cProfile.
Edit: Would representing the 3D list and other lists as numpy arrays/matrix and copying them with numpy inside a custom copy function be the better way?
For people from the future having the same problem:
What I did was creating a method inside the class that would manually copy the information. I did not override deepcopy, maybe that would be cleaner, maybe not.
I tried with and without numpy for 2D and 3D lists, but appending 2 numpy arrays later in the code was much more time consuming than making a sum of 2 lists (which I did need to do for my specific program).
So I used:
my_list = list(map(list, my_list)) # for 2D list
my_list = [list(map(list, x)) for x in my_list] # for 3D list

avoid copying during array conversion between numpy and mxnet

I want to reduce a memory copy step during my data processing pipeline.
I want to do the following:
Generate some data from a custom C library
Feed the generated data into a MXNet model running on GPU.
For now, my pipeline does the following:
Create a C-contiguous numpy array via np.empty(...).
Get the pointer to numpy array via np.ndarray.__array_interface__
Call the C library from python (via ctypes) to fill the numpy array.
Convert the numpy array into mxnet NDArray, this will copy the underlying memory buffer.
Pack NDArrays into a mx.io.DataBatch instance, then feed into model.
Please note, before being fed into model, all arrays stay in CPU memory.
I noticed a mx.io.DataBatch can only take a list of mx.ndarray.NDArrays as data and label parameter, but not numpy arrays. It works until you feed it into a model. On the other hand, I have a C library that can write directly to a C-contiguous array.
I would like to avoid the memory copying in step 3. One possible way is somehow getting a raw pointer to buffer of NDArray, while totally ignoring numpy. But whatever works.
I figured out a hacky way to achieve this. Here's a small example.
from ctypes import *
import numpy as np
import mxnet as mx
m = mx.ndarray.zeros((4,4))
m.wait_to_read() # make sure the data is allocated
c_uint64_p = POINTER(c_uint64)
handle= cast(m.handle, c_uint64_p) # NDArray*
ptr_ = cast(handle[0], c_uint64_p) # shared_ptr<Chunk>
dptr = cast(ptr_[0], POINTER(c_float)) # shandle.dptr
n = np.ctypeslib.as_array(dptr, shape=(4,4)) # m and n will share buffer
I derived the above code by looking at MxNet C++ source code. Some explanation:
First, note the NDArray.handle attribute. It's a c_void_p. Read the python source code, you will know it's NDArrayHandle. Now dive into src/c_api/c_api_ndarray.cc code, it's reinterpreted as NDArray*.
In the source tree, go to include/mxnet/ndarray.h and find NDArray class. The first field is:
/*! \brief internal data of NDArray */
std::shared_ptr<Chunk> ptr_{nullptr};
Checking Chunk, which is a struct defined inside NDArray, we see:
/*! \brief the real data chunk that backs NDArray */
// shandle is used to store the actual values in the NDArray
// aux_handles store the aux data(such as indices) if it's needed by non-default storage.
struct Chunk {
/*! \brief storage handle from storage engine.
for non-default storage, shandle stores the data(value) array.
*/
Storage::Handle shandle;
Finally, shandle is defined in include/mxnet/storage.h:
struct Handle {
/*!
* \brief Pointer to the data.
*/
void* dptr{nullptr};
Writing a small program shows sizeof(shared_ptr<some_type>) is 16. Based on this question, we can guess shared_ptr is composed of two pointers. It's not too hard to figure out the first pointer is the pointer to data. Putting everything together, all that needed are two pointer de-referencing.
On the down site, this method cannot be used in production environment or large projects. It could break in future release, or introduce tough bugs and security holes.

Why does numpy.zeros takes up little space

I am wondering why numpy.zeros takes up such little space?
x = numpy.zeros(200000000)
This takes up no memory while,
x = numpy.repeat(0,200000000)
takes up around 1.5GB. Does numpy.zeros create an array of empty pointers?
If so, is there a way to set the pointer back to empty in the array after changing it in cython? If I use:
x = numpy.zeros(200000000)
x[0:200000000] = 0.0
The memory usage goes way up. Is there a way to change a value, and then change it back to the format numpy.zeros originally had it in python or cython?
Are you using Linux? Linux has lazy allocation of memory. The underlying calls to malloc and calloc in numpy always 'succeed'. No memory is actually allocated until the memory is first accessed.
The zeros function will use calloc which zeros any allocated memory before it is first accessed. Therfore, numpy need not explicitly zero the array and so the array will be lazily initialised. Whereas, the repeat function cannot rely on calloc to initialise the array. Instead it must use malloc and then copy the repeated to all elements in the array (thus forcing immediate allocation).

apply two functions to the two halves of a numpy array

I am trying to find how to apply two functions to a numpy array each one only on half the values.
Here is the code I have been trying
def hybrid_array(xs,height,center,fwhh):
xs[xs<=center] = height*np.exp((-(xs[xs<=center]-center)**2)/(2*(fwhh/(2*np.sqrt(2*np.log(2))))**2))
xs[xs>center] = height*1/(np.abs(1+((center-xs[xs>center])/(fwhh/2))**2))
return xs
However I am overwriting the initial array that is passed to the argument. The usual trick of copying it with a slice ie. the following still changes b.
a = b[:]
c = hybrid_array(a,args)
If there is a better way of doing any part of what I am trying, I would be very grateful if you could let me know as I am still new to numpy arrays.
Thank you
Try copy.deepcopy to copy the array b onto a before calling your function.
import copy
a = copy.deepcopy(b)
c = hybrid_array(a,args)
Alternatively, you can use the copy method of the array
a = b.copy()
c = hybrid_array(a,args)
Note***
You may be wondering, why despite an easier way to copy an array with the copy method of numpy array I introduced the copy.deepcopy. Other's may disagree but here is my reasoning
Using the method deepcopy makes it clear that you are intending to do a deepcopy instead of reference copy
All python's data type do not support the copy method. Numpy has it and good it has but when you are programming with numpy and python you may end up using various numpy and non numpy data types not all of which would support the copy method. To remain consistent I would prefer to use the first.
Copying a NumPy array a is done with a.copy(). In your application, however, there is no need to copy the old data. All you need is a new array of the same shape and dtype as the old one. You can use
result = numpy.empty_like(xs)
to create such an array. If you generally don't want your function to modify its parameter, you should do this inside the function, rather than requiring the caller to take care of this.

Categories

Resources