I am working on manipulating numpy arrays using the multiprocessing module and am running into an issue trying out some of the code I have run across here. Specifically, I am creating a ctypes array from a numpy array and then trying to return the ctypes array to a numpy array. Here is the code:
shared_arr = multiprocessing.RawArray(_numpy_to_ctypes[array.dtype.type],array.size)
I do not need any kind of synchronization lock, so I am using RawArray. The ctypes data type is pulled from a dictionary based on the dtype of the input array. That is working wonderfully.
shared_arr = numpy.ctypeslib.as_array(shared_arr.get_obj())
Here I get a stack trace stating:
AttributeError: 'c_double_Array_16154769' object has no attribute 'get_obj'
I have also tried the following from this post, but get an identical error.
def tonumpyarray(shared_arr):
return numpy.frombuffer(shared_arr.get_obj())
I am stuck running python 2.6 and do not know if that is the issue, if it is an issue with sharing the variable name (I am trying to keep memory usage as low as possible and am trying not to duplicate the numpy array and the ctypes array in memory), or something else as I am just learning about this component of python.
Suggestions?
Since you use RawArray, it's just a ctypes array allocated from shared memory, There is no wrapped object, so you don't need get_obj() method to get the wrapped object:
>>> shared_arr = multiprocessing.RawArray("d",10)
>>> t = np.frombuffer(shared_arr, dtype=float)
>>> t[0] = 2
>>> shared_arr[0]
2.0
Related
I am just wondering how to solve the attribute error in python3.6.
The error is
'list' object has no attribute 'astype'.
My related code is as blow.
def _init_mean_std(self, data):
data = data.astype('float32')
self.mean, self.std = np.mean(data), np.std(data)
self.save_meanstd()
return data
Is there anyone who can advice to me?
Thank you!
The root issue is confusion of Python lists and NumPy arrays, which are different data types. NumPy methods that are invoked as np.foo(array) usually won't complain if you give them a Python list, they will convert it to an NumPy array silently. But if you try to invoke a method contained in the object, like array.foo() then of course it has to have the appropriate type already.
I would suggest using
data = np.array(data, dtype=np.float32)
so that the type of an array is known to NumPy at once. This avoids unnecessary work where you first create an array and then cast it to another type.
NumPy recommends using dtype objects instead of strings like "float32".
I've created a shared library. And I'm using it like that
class CAudioRecoveryStrategy(AbstractAudioRecoveryStrategy):
def __init__(self):
array_1d_double = npct.ndpointer(dtype=numpy.double, ndim=1, flags='CONTIGUOUS')
self.lib = npct.load_library("libhello", ".")
self.lib.demodulate.argtypes = [array_1d_double, array_1d_double, ctypes.c_int]
def demodulate(self, input):
output = numpy.empty_like(input)
self.lib.demodulate(input, output, input.size)
return output
Right now I have a problem, which is in c++ code I only have pointer to array of output data, not the array. So I can't return the array, unless I manually copy it.
What is the right way to do it? It must be efficient (like aligned memory etc.)
Numpy arrays implement the buffer protocol, see
https://docs.python.org/2/c-api/buffer.html. In particular,
parse the input object to a PyObject* (conversion O if you're
using PyArg_ParseTuple or PyArg_ParseTupleAndKeywords), then
do PyObject_CheckBuffer, to ensure that the type supports the
protocol (numpy arrays do), then PyObject_GetBuffer to fill in
a Py_buffer struct with the physical addresses, dimensions,
etc. of the underlying memory block. To return a numpy buffer
is more complicated; in general, I've found it sufficient to
create objects of my own type which also support the buffer
protocol (set tp_as_buffer to non null in the PyTypeObject).
Otherwise (but I've not actually tried this), you'll have to
import the numpy module, get its array attribute, call it with
the correct arguments, and then use the buffer protocol above on
the object you thus construct.
I have a C function that mallocs() and populates a 2D array of floats. It "returns" that address and the size of the array. The signature is
int get_array_c(float** addr, int* nrows, int* ncols);
I want to call it from Python, so I use ctypes.
import ctypes
mylib = ctypes.cdll.LoadLibrary('mylib.so')
get_array_c = mylib.get_array_c
I never figured out how to specify argument types with ctypes. I tend to just write a python wrapper for each C function I'm using, and make sure I get the types right in the wrapper. The array of floats is a matrix in column-major order, and I'd like to get it as a numpy.ndarray. But its pretty big, so I want to use the memory allocated by the C function, not copy it. (I just found this PyBuffer_FromMemory stuff in this StackOverflow answer: https://stackoverflow.com/a/4355701/3691)
buffer_from_memory = ctypes.pythonapi.PyBuffer_FromMemory
buffer_from_memory.restype = ctypes.py_object
import numpy
def get_array_py():
nrows = ctypes.c_int()
ncols = ctypes.c_int()
addr_ptr = ctypes.POINTER(ctypes.c_float)()
get_array_c(ctypes.byref(addr_ptr), ctypes.byref(nrows), ctypes.byref(ncols))
buf = buffer_from_memory(addr_ptr, 4 * nrows * ncols)
return numpy.ndarray((nrows, ncols), dtype=numpy.float32, order='F',
buffer=buf)
This seems to give me an array with the right values. But I'm pretty sure it's a memory leak.
>>> a = get_array_py()
>>> a.flags.owndata
False
The array doesn't own the memory. Fair enough; by default, when the array is created from a buffer, it shouldn't. But in this case it should. When the numpy array is deleted, I'd really like python to free the buffer memory for me. It seems like if I could force owndata to True, that should do it, but owndata isn't settable.
Unsatisfactory solutions:
Make the caller of get_array_py() responsible for freeing the memory. That's super annoying; the caller should be able to treat this numpy array just like any other numpy array.
Copy the original array into a new numpy array (with its own, separate memory) in get_array_py, delete the first array, and free the memory inside get_array_py(). Return the copy instead of the original array. This is annoying because it's an ought-to-be unnecessary memory copy.
Is there a way to do what I want? I can't modify the C function itself, although I could add another C function to the library if that's helpful.
I just stumbled upon this question, which is still an issue in August 2013. Numpy is really picky about the OWNDATA flag: There is no way it can be modified on the Python level, so ctypes will most likely not be able to do this. On the numpy C-API level - and now we are talking about a completely different way of making Python extension modules - one has to explicitly set the flag with:
PyArray_ENABLEFLAGS(arr, NPY_ARRAY_OWNDATA);
On numpy < 1.7, one had to be even more explicit:
((PyArrayObject*)arr)->flags |= NPY_OWNDATA;
If one has any control over the underlying C function/library, the best solution is to pass it an empty numpy array of the appropriate size from Python to store the result in. The basic principle is that memory allocation should always be done on the highest level possible, in this case on the level of the Python interpreter.
As kynan commented below, if you use Cython, you have to expose the function PyArray_ENABLEFLAGS manually, see this post Force NumPy ndarray to take ownership of its memory in Cython.
The relevant documentation is here
and here.
I would tend to have two functions exported from my C library:
int get_array_c_nomalloc(float* addr, int nrows, int ncols); /* Pass addr as argument */
int get_array_c(float **addr, int nrows, int ncols); /* Calls function above */
I would then write my Python wrapper[1] of get_array_c to allocate the array, then call get_array_c_nomalloc. Then Python does own the memory. You could integrate this wrapper into your library so your user never has to be aware of get_array_c_nomalloc's existence.
[1] This isn't really a wrapper anymore, but instead is an adapter.
I have a numpy array that I would like to share between a bunch of python processes in a way that doesn't involve copies. I create a shared numpy array from an existing numpy array using the sharedmem package.
import sharedmem as shm
def convert_to_shared_array(A):
shared_array = shm.shared_empty(A.shape, A.dtype, order="C")
shared_array[...] = A
return shared_array
My problem is that each subprocess needs to access rows that are randomly distributed in the array. Currently I create a shared numpy array using the sharedmem package and pass it to each subprocess. Each process also has a list, idx, of rows that it needs to access. The problem is in the subprocess when I do:
#idx = list of randomly distributed integers
local_array = shared_array[idx,:]
# Do stuff with local array
It creates a copy of the array instead of just another view. The array is quite large and manipulating it first before shareing it so that each process accesses a contiguous range of rows like
local_array = shared_array[start:stop,:]
takes too long.
Question: What are good solutions for sharing random access to a numpy array between python processes that don't involve copying the array?
The subprocesses need readonly access (so no need for locking on access).
Fancy indexing induces a copy, so you need to avoid fancy indexing if you want to avoid copies there is no way around it.
How do you create an array of defined length of the certain type in python? To be precise I am trying to create an array of handles that is able to hold up to 1024 records. I figured out an analog to HANDLE type in python, which would be c_void_p of ctypes.
For example C++ code would have:
HANDLE myHandles[1024];
What would be python analogy to the C++ code above?
Thank you for your input.
You've already accepted an answer, but since you tagged ctypes you might want to know how to create arrays of ctypes types:
>>> import ctypes
>>> ctypes.c_void_p * 1024 # NOTE: this is a TYPE
<class '__main__.c_void_p_Array_1024'>
>>> (ctypes.c_void_p * 1024)() # This is an INSTANCE
<__main__.c_void_p_Array_1024 object at 0x009BB5D0>
In python, you generally just create an array, and you can put any values you like in it. It's dynamic.
my_handles = []
You can put as many of any type of values in this as you want now.
...is there a specific reason you want to create a specific type and a specific length?