Briefly:
Is there an efficient way to make a numpy array given a pointer in memory to the array, it's type, and the number of elements?
More detail:
I am working with a python framework which has an object.GetData() command that is supposed to return a pointer to the data (an array of 35,000 int8) of this object.
I'm supposed to be able efficiently load these integers to a numpy array through
arr = numpy.frombuffer(object.GetData(),count=35000,dtype="int8")
but this doesn't seem to work. I get an error message ValueError: buffer is smaller than requested size. Changing the length, I can get it to output an array, but typically less than 20 integers in length (usually 0 or 1 integers).
I believe I can access the pointer to the start of the array, in hex form, through
hex(id(object.GetData()))
which looks like it gives addresses (e.g. 0x10fd8c670) but I don't know if this is the actual address.
I'm more comfortable in python than c++, but there could be a bug in the c++ code. The c++ code for GetData is:
const _Tp* GetData() const
{
// Return a const pointer to the internal data
return (fData.size() > 0 ) ? &(fData)[0] : NULL;
}
where fdata is initialized as a VecType through:
VecType fData;
Right now I can access each element of the object's data through an object.At(i) command where i is the index of the data array of object, but it is very slow to load each element into a numpy array this way, and I'm dealing with a lot of data. For reference, the At command in the c++ code does this:
_Tp At(size_t i) const
{
return fData.at(i);
}
Any help would be appreciated. I don't have a ton of experience with pointers, and even less with pointers in python, but I would like to figure this out in python rather than re-write all my code in c++. Thanks!
Related
I am trying to convert some python code into cython. In the python code I use data of type
array.array('i', [...]) and use the method array.insert to insert an element at a specific index. in cython however, when I try to insert an element using the same method I get this error: BufferError: cannot resize an array that is exporting buffers
basically:
from cpython cimport array
cdef array.array[int] a = array.array('i', [1,2,3,3])
a.insert(1,5) # insert 5 in the index 1 -> throws error
I have been looking at cyappend3 of this answer but I am using libcpp and not sure I understand the magic written there.
Any idea how to insert an element at a specific index in an array.array?
Partial answer:
BufferError: cannot resize an array that is exporting buffers
This is telling you that you have a memoryview (or similar) of the array somewhere. It isn't possible to resize it because that memoryview is looking directly into that array's data and resizing the array will require reallocating the data. You can replicate this error in Python too if you do view = memoryview(arr) before you try to insert.
In your case:
cdef array.array[int] a = array.array('i', [1,2,3,3])
cdef array.array[int] a is defining an array with a fast buffer to elements of the array, and it's this buffer that prevents you from resizing it. If you just do cdef array.array a it works fine. Obviously you lose the fast buffer access to individual elements, but that's because you're trying to change the data out from under the buffer.
I strongly recommend you don't resize arrays though. Not only does it involve the O(n) copy of every element of the array. Also, unlike Python list, array doesn't over-allocate so even append causes a complete reallocation and copy every time (i.e. is O(n) rather than amortized O(1)).
Instead I'd suggest keeping the data as a Python list (or maybe something else) until you've finalized the length and only then converting to array.
what has been answered here in this post is correct, (https://stackoverflow.com/a/74285371/4529589), and I have the same recommendation.
However I want to add this point that if you want to use the insert but as well if you want to use the insert and still define as the c buffer, you could use the std::vector. This will be faster.
from libcpp.vector cimport vector
cdef vector[int] vect = array.array('i', [1,2,3,3])
vect.insert(vect.begin() + 1 ,5)
and as well I recomend if you want to use this solution just drop the array and from the begining just use the vector initialization.
I have a python script that I have to translate in c++, and 80 % of my python script is based on lists.
I have a file that I read, and put the data of that file in a list :
//Code to translate in c++
bloc = [line]
for b in range(11):
bloc.append(lines[i + 1])
i += 1
I make my stuff with that data and then, I do it again until I read the whole file.
And finally I want to be able to get data of this list doing something like :
#Python script
var = bloc[0, 1, 2, 3 ...]
I'll respond to any questions you need more infos
The C++ container closest to a python List is a std::vector. However contrary to python a std::vector contains only one type of element. You have to declare what the vector will hold.
In your case it would be std::string (reading from a file).
So:
std::vector<std::string> cpp_list; // container for lines (stored as string )from the file
is equivalent to python python_list = []
should get you started.
With a std::vector you do not strictly need to allocated storage upfront but for performance reason it is better to do is if you know the required size in advance.
if you use cpp_list.reserve(something) or do not do any memory allocation, you must push in the vector using cpp_list.push_back(...) which is similar to pyhton_list.append(...)
If you allocate memory upfront eg: std::vector<std::string> cpp_list(nb_lines)
You must use indexing as in python eg cpp_list[3] = something
This is my first question on this site.
First of all, I need to make a module with one function for python in C++, which must work with numpy, using <numpy/arrayobject.h>. This function takes one numpy array and returns two numpy arrays. All arrays are one-dimensional.
The first question is how to get the data from a numpy array? I want to collect the information from array in std::vector, so then I can easily work with it C++.
The second: am I right that function should return a tuple of arrays, then user of my module can write like this in python:
arr1, arr2 = foo(arr)
?
And how to return like this?
Thank you very much.
NumPy includes lots of functions and macros that make it pretty easy to access the data of an ndarray object within a C or C++ extension. Given a 1D ndarray called v, one can access element i with PyArray_GETPTR1(v, i). So if you want to copy each element in the array to a std::vector of the same type, you can iterate over each element and copy it, like so (I'm assuming an array of doubles):
npy_intp vsize = PyArray_SIZE(v);
std::vector<double> out(vsize);
for (int i = 0; i < vsize; i++) {
out[i] = *reinterpret_cast<double*>(PyArray_GETPTR1(v, i));
}
One could also do a bulk memcpy-like operation, but keep in mind that NumPy ndarrays may be mis-aligned for the data type, have non-native byte order, or other subtle attributes that make such copies less than desirable. But assuming that you are aware of these, one could do:
npy_intp vsize = PyArray_SIZE(v);
std::vector<double> out(vsize);
std::memcpy(out.data(), PyArray_DATA(v), sizeof(double) * vsize);
Using either approach, out now contains a copy of the ndarray's data, and you can manipulate it however you like. Keep in mind that, unless you really need the data as a std::vector, the NumPy C API may be perfectly fine to use in your extension as a way to access and manipulate the data. That is, unless you need to pass the data to some other function which must take a std::vector or you want to use C++ library code that relies on std::vector, I'd consider doing all your processing directly on the native array types.
As to your last question, one generally uses PyArg_BuildValue to construct a tuple which is returned from your extension functions. Your tuple would just contain two ndarray objects.
I've been writing a Python extension that writes into a NumPy array from C. During testing, I noticed that certain very large arrays would generate a segfault when I tried to access some of their elements.
Specifically, the last line of the following code segment fails with a segfault:
// Size of buffer we will write to
npy_intp buffer_len_alt = BUFFER_LENGTH;
//
PyArray_Descr * dtype;
dtype = PyArray_DescrFromType(NPY_BYTE);
PyObject* column = PyArray_Zeros(1, &buffer_len_alt, dtype, 0);
//Check that array creation succeeds
if (column == NULL){
// This exit point is not reached, so it looks like everything is OK
return (PyObject *) NULL;
}
// Get the array's internal buffer so we can write to it
output_buffer = PyArray_BYTES((PyArrayObject *)column);
// Try writing to the buffer
output_buffer[0] = 'x'; //No segfault
output_buffer[((int) buffer_len_alt) - 1] = 'x'; // Segfault here
I checked and found that the error occurs only when I try to allocate an array of about 3GB (i.e. BUFFER_LENGTH is about 3*2^30). It's not surprising that an allocation of this size would fail, even if Python is using it's custom allocator. What really concerns me is that NumPy did not raise an error or otherwise indicate that the array creation did not go as planned.
I have already tried checking PyArray_ISCONTIGUOUS on the returned array, and using PyArray_GETCONTIGUOUS to ensure it is a single memory segment, but the segfault would still occur. NPY_ARRAY_DEFAULT creates contiguous arrays, so this shouldn't be necessary anyways.
Is there some error flag I should be checking? How can I detect/prevent this situation in the future? Setting BUFFER_LENGTH to a smaller value obviously works, but this value is determined at runtime and I would like to know the exact bounds.
EDIT:
As #DavidW pointed out, the error stems from casting buffer_len_alt to an int, since npy_intp can be a 64-bit number. Replacing the cast to int with a cast to 'unsigned long' fixes the problem for me.
The issue (diagnosed in the comments) was actually with the array lookup rather than the allocation of the array. Your code contained the line
output_buffer[((int) buffer_len_alt) - 1] = 'x'
When buffer_len_alt (approx value 3000000000) was cast to an (32 bit) int (maximum value 2147483647) you ended up with an invalid address, probably a large negative number.
The solution is just to use
output_buffer[buffer_len_alt - 1] = 'x'
(i.e. I don't see why you should need a cast at all).
I am currently using Python to parse a C file using LibClang. I've encountered a problem while reading a C-array which size is defined by a define-directive-variable.
With node.get_children i can perfectly read the following array:
int myarray[20][30][10];
As soon as the array size is replaced with a variable, the array won't be read correctly. The following array code can't be read.
#define MAX 60;
int myarray[MAX][30][10];
Actually the parser stops at MAX and in the dump there is the error: invalid sloc.
How can I solve this?
Thanks
Run the code through a C preprocessor before trying to parse it. That will cause all preprocessor-symbols to be replaced by their values, i.e. your [MAX] will become [60].
Note that C code can also do this:
const int three[] = { 1, 2, 3 };
i.e. let the compiler deduce the length of the array from the number of initializer values given.
Or, from C99, even this:
const int hundred[] = { [99] = 4711 };
So a naive approach might still break, but I don't know anything about the capabilities of the parser you're using, of course.
Semicolon ; in the define directive way causing the error.