C array to PyArray - python

I'm writing a Python C-Extension without using Cython.
I want to allocate a double array in C, use it in an internal function (that happens to be in Fortran) and return it. I point out that the C-Fortran interface works perfectly in C.
static PyObject *
Py_drecur(PyObject *self, PyObject *args)
{
// INPUT
int n;
int ipoly;
double al;
double be;
if (!PyArg_ParseTuple(args, "iidd", &n, &ipoly, &al, &be))
return NULL;
// OUTPUT
int nd = 1;
npy_intp dims[] = {n};
double a[n];
double b[n];
int ierr;
drecur_(n, ipoly, al, be, a, b, ierr);
// Create PyArray
PyObject* alpha = PyArray_SimpleNewFromData(nd, dims, NPY_DOUBLE, a);
PyObject* beta = PyArray_SimpleNewFromData(nd, dims, NPY_DOUBLE, b);
Py_INCREF(alpha);
Py_INCREF(beta);
return Py_BuildValue("OO", alpha, beta);
}
I debugged this code and I get a Segmentation fault when I try to create alpha out of a. Up to there everything works fine. The function drecur_ works and I get the same problem if it is removed.
Now, what's the standard way of defining a PyArray around C data? I found documentation but no good example. Also, what about memory leakage? Is it correct to INCREF before return so that the instance of alpha and beta are preserved? What about the deallocation when they are not needed anymore?
EDIT
I finally got it right with the approach found in NumPy cookbook.
static PyObject *
Py_drecur(PyObject *self, PyObject *args)
{
// INPUT
int n;
int ipoly;
double al;
double be;
double *a, *b;
PyArrayObject *alpha, *beta;
if (!PyArg_ParseTuple(args, "iidd", &n, &ipoly, &al, &be))
return NULL;
// OUTPUT
int nd = 1;
int dims[2];
dims[0] = n;
alpha = (PyArrayObject*) PyArray_FromDims(nd, dims, NPY_DOUBLE);
beta = (PyArrayObject*) PyArray_FromDims(nd, dims, NPY_DOUBLE);
a = pyvector_to_Carrayptrs(alpha);
b = pyvector_to_Carrayptrs(beta);
int ierr;
drecur_(n, ipoly, al, be, a, b, ierr);
return Py_BuildValue("OO", alpha, beta);
}
double *pyvector_to_Carrayptrs(PyArrayObject *arrayin) {
int n=arrayin->dimensions[0];
return (double *) arrayin->data; /* pointer to arrayin data as double */
}
Feel free to comment on this and thanks for the answers.

So the first things that looks suspicious, is that your array a and b are in the local scope of the function. That means after the return you will get an illegal memory access.
So you should declare the arrays with
double *a = malloc(n*sizeof(double));
Then you need to make sure that the memory is later freed by the object you have created.
See this quote of the documentation:
PyObject PyArray_SimpleNewFromData(int nd, npy_intp dims, int typenum, void* data)
Sometimes, you want to wrap memory allocated elsewhere into an ndarray object for downstream use. This routine makes it straightforward to do that. The first three arguments are the same as in PyArray_SimpleNew, the final argument is a pointer to a block of contiguous memory that the ndarray should use as it’s data-buffer which will be interpreted in C-style contiguous fashion. A new reference to an ndarray is returned, but the ndarray will not own its data. When this ndarray is deallocated, the pointer will not be freed.
You should ensure that the provided memory is not freed while the returned array is in existence. The easiest way to handle this is if data comes from another reference-counted Python object. The reference count on this object should be increased after the pointer is passed in, and the base member of the returned ndarray should point to the Python object that owns the data. Then, when the ndarray is deallocated, the base-member will be DECREF’d appropriately. If you want the memory to be freed as soon as the ndarray is deallocated then simply set the OWNDATA flag on the returned ndarray.
For your second question the Py_INCREF(alpha); is generally only necessary if you intend to keep the reference in a global variable or a class member.
But since you are only wrapping a function you don't have to do it.
Sadly it could be that the function PyArray_SimpleNewFromData does not set the reference counter to 1, if that would be the case you would have to increase it to 1. I hope that was understandable ;).

One problem might be that your arrays (a,b) have to last at least as long as the numpy-array that contains it. You've created your arrays in local scope so they will be destroyed when you leave the method.
Try to have python allocating the array (e.g. using PyArray_SimpleNew), copy your content into it and pass it a pointer. You might also want to use boost::python for taking care of these details, if building against boost is an option.

Related

How to create a simple numpy array using the C API, with proper memory management?

I have two related questions regarding creating a numpy array using the C API. Given a newly created numpy array:
std::vector<double> vec({0.1, 0.2});
int length = vec.size();
double* data = new double[length];
std::copy(vec.begin(), vec.end(), data);
PyObject* obj = PyArray_SimpleNewFromData(1, &length, NPY_DOUBLE, (void*)data);
How do I ensure proper memory management?
I didn't want to give PyArray_SimpleNewFromData a pointer to vec.size() since that's owned by vec so I copied the data into newly allocated memory. However, will numpy/python "just know" it needs deleting at end of scope? The docs mention about setting the OWNDATA flag, is this appropriate for heap-allocated memory?
How do I get a PyArrayObject* from a PyObject* returned from PyArray_SimpleNewFromData?
All the new array creation mechanisms such as PyArray_SimpleNewFromData return a PyObject*, but if you want to do anything using the numpy c api, you need a PyArrayObject*.
Edit
I was able to do an reinterpret_cast for 2, for instance:
int owned = PyArray_CHKFLAGS(reinterpret_cast<PyArrayObject *>($result), NPY_ARRAY_OWNDATA);
PyArray_ENABLEFLAGS(reinterpret_cast< PyArrayObject*>($result), NPY_ARRAY_OWNDATA);
owned = PyArray_CHKFLAGS(reinterpret_cast< PyArrayObject *>($result), NPY_ARRAY_OWNDATA);
if (!owned){
throw std::logic_error("PyArrayObject does not own its memory");
}
Still not sure if this is the "right" way to do it.
Regarding 1:
If the data ownership should be passed to the PyObject returned by PyArray_SimpleNewFromData then OWNDATA must be set. In this case the memory must be heap allocated, because stack allocated variables are released as soon as the scope of the variable is left.
PyObject contains a reference count to manage the underlying memory. When the reference count is decremented to 0 then the object is deleted (Py_INCREF,Py_DECREF).
Regarding 2:
PyArrayObject is a struct which contains the variable ob_base of type PyObject.
This PyObject is returned from PyArray_SimpleNewFromData.
Since ob_base is the first variable of PyArrayObject, reinterpret_cast can be used to obtain PyArrayObject from PyObject.
Py_TYPE can be used to check the type of a PyObject.

PyArray_SimpleNewFromData

So I am trying to write a C function that accepts a numpy array object, extracts the data, does some manipulations and returns another c array as a numpy array object. Everything works seamlessly and I use python wrappers which help easy manipulation on the python side. However, I am facing a memory leak. I have an output pointer of doubles that I malloc-ed and which I wrap into a Python array object just before returning it to the calling python function,
PyObject *arr;
int nd = 2;
npy_intp dims[] = {5, 10};
double *data = some_function_that_returns_a_double_star(x, y, z);
arr = PyArray_SimpleNewFromData(nd, dims, NPY_DOUBLE, (void *)data);
return arr;
However, this creates a memory leak, because data is never freed and I did some googling to find that this is a problem in such applications and solution is non-trivial. The most helpful resource that I found on this is given here. I could not implement the destructor that this page talks about from the given example. Can someone help me with this? More concretely I am looking for something like,
PyObject *arr;
int nd = 2;
npy_intp dims[] = {5, 10};
double *data = some_function_that_returns_a_double_star(x, y, z);
arr = PyArray_SimpleNewFromData(nd, dims, NPY_DOUBLE, (void *)data);
some_destructor_that_plug_memLeak_due_to_data_star(args);
return arr;
Numpy Array C APIs
Numpy C APIs
The technique described in the link you didn't understand is a good one: create a Python object that knows how to free your memory when destroyed, and make it the base of the returned array.
It sounds like you might have been overwhelmed by the complexity of creating a new extension type. Fortunately, that's not necessary. Python comes with a type designed to perform arbitrary C-level cleanup when destroyed: capsules, which bundle together a pointer and a destructor function and call the destructor when the capsule is destroyed.
To create a capsule for your memory, first, we define a destructor function:
void capsule_cleanup(PyObject *capsule) {
void *memory = PyCapsule_GetPointer(capsule, NULL);
// I'm going to assume your memory needs to be freed with free().
// If it needs different cleanup, perform whatever that cleanup is
// instead of calling free().
free(memory);
}
And you set a capsule as your array's base with
PyObject *capsule = PyCapsule_New(data, NULL, capsule_cleanup);
PyArray_SetBaseObject((PyArrayObject *) arr, capsule);
// Do not Py_DECREF the capsule; PyArray_SetBaseObject stole your
// reference.
And that should ensure your memory gets freed once it's no longer in use.
While the PyCapsule approach works more generally, you can get numpy to free the memory in the array for you when it's garbage collected by setting the OWNDATA flag.
double *data = some_function_that_returns_a_double_star(x, y, z);
PyObject *arr = PyArray_SimpleNewFromData(nd, dims, NPY_DOUBLE, (void *)data);
PyArray_ENABLEFLAGS((PyArrayObject*) arr, NPY_ARRAY_OWNDATA);

Calling C from Python: passing list of numpy pointers

I have a variable number of numpy arrays, which I'd like to pass to a C function. I managed to pass each individual array (using <ndarray>.ctypes.data_as(c_void_p)), but the number of array may vary a lot.
I thought I could pass all of these "pointers" in a list and use the PyList_GetItem() function in the C code. It works like a charm, except that the values of all elements are not the pointers I usually get when they are passed as function arguments.
Though, if I have :
from numpy import array
from ctypes import py_object
a1 = array([1., 2., 3.8])
a2 = array([222.3, 33.5])
values = [a1, a2]
my_cfunc(py_object(values), c_long(len(values)))
And my C code looks like :
void my_cfunc(PyObject *values)
{
int i, n;
n = PyObject_Length(values)
for(i = 0; i < n; i++)
{
unsigned long long *pointer;
pointer = (unsigned long long *)(PyList_GetItem(values, i);
printf("value 0 : %f\n", *pointer);
}
}
The printed value are all 0.0000
I have tried a lot of different solutions, using ctypes.byref(), ctypes.pointer(), etc. But I can't seem to be able to retrieve the real pointer values. I even have the impression the values converted by c_void_p() are truncated to 32 bits...
While there are many documentations about passing numpy pointers to C, I haven't seen anything about c_types within Python list (I admit this may seem strange...).
Any clue ?
After a few hours spent reading many pages of documentation and digging in numpy include files, I've finally managed to understand exactly how it works. Since I've spent a great amount of time searching for these exact explanations, I'm providing the following text as a way to avoid anyone to waste its time.
I repeat the question :
How to transfer a list of numpy arrays, from Python to C
(I also assume you know how to compile, link and import your C module in Python)
Passing a Numpy array from Python to C is rather simple, as long as it's going to be passed as an argument in a C function. You just need to do something like this in Python
from numpy import array
from ctypes import c_long
values = array([1.0, 2.2, 3.3, 4.4, 5.5])
my_c_func(values.ctypes.data_as(c_void_p), c_long(values.size))
And the C code could look like :
void my_c_func(double *value, long size)
{
int i;
for (i = 0; i < size; i++)
printf("%ld : %.10f\n", i, values[i]);
}
That's simple... but what if I have a variable number of arrays ? Of course, I could use the techniques which parses the function's argument list (many examples in Stackoverflow), but I'd like to do something different.
I'd like to store all my arrays in a list and pass this list to the C function, and let the C code handle all the arrays.
In fact, it's extremely simple, easy et coherent... once you understand how it's done ! There is simply one very simple fact to remember :
Any member of a list/tuple/dictionary is a Python object... on the C side of the code !
You can't expect to directly pass a pointer as I initially, and wrongly, thought. Once said, it sounds very simple :-) Though, let's write some Python code :
from numpy import array
my_list = (array([1.0, 2.2, 3.3, 4.4, 5.5]),
array([2.9, 3.8. 4.7, 5.6]))
my_c_func(py_object(my_list))
Well, you don't need to change anything in the list, but you need to specify that you are passing the list as a PyObject argument.
And here is the how all this is being accessed in C.
void my_c_func(PyObject *list)
{
int i, n_arrays;
// Get the number of elements in the list
n_arrays = PyObject_Length(list);
for (i = 0; i LT n_arrays; i++)
{
PyArrayObject *elem;
double *pd;
elem = PyList_GetItem(list,
i);
pd = PyArray_DATA(elem);
printf("Value 0 : %.10f\n", *pd);
}
}
Explanation :
The list is received as a pointer to a PyObject
We get the number of array from the list by using the PyObject_Length() function.
PyList_GetItem() always return a PyObject (in fact a void *)
We retrieve the pointer to the array of data by using the PyArray_DATA() macro.
Normally, PyList_GetItem() returns a PyObject *, but, if you look in the Python.h and ndarraytypes.h, you'll find that they are both defined as (I've expanded the macros !):
typedef struct _object {
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
And the PyArrayObject... is exactly the same. Though, it's perfectly interchangeable at this level. The content of ob_type is accessible for both objects and contain everything which is needed to manipulate any generic Python object. I admit that I've used one of its member during my investigations. The struct member tp_name is the string containing the name of the object... in clear text; and believe me, it helped ! This is how I discovered what each list element was containing.
While these structures don't contain anything else, how is it that we can access the pointer of this ndarray object ? Simply using object macros... which use an extended structure, allowing the compiler to know how to access the additional object's elements, behind the ob_type pointer. The PyArray_DATA() macro is defined as :
#define PyArray_DATA(obj) ((void *)((PyArrayObject_fields *)(obj))->data)
There, it's casting the PyArayObject * as a PyArrayObject_fields * and this latest structure is simply (simplified and macros expanded !) :
typedef struct tagPyArrayObject_fields {
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
char *data;
int nd;
npy_intp *dimensions;
npy_intp *strides;
PyObject *base;
PyArray_Descr *descr;
int flags;
PyObject *weakreflist;
} PyArrayObject_fields;
As you can see, the first two element of the structure are the same as a PyObject and PyArrayObject, but additional elements can be addressed using this definition. It is tempting to directly access these elements, but it's a very bad and dangerous practice which is more than strongly discouraged. You must rather use the macros and don't bother with the details and elements in all these structures. I just thought you might be interested by some internals.
Note that all PyArrayObject macros are documented in http://docs.scipy.org/doc/numpy/reference/c-api.array.html
For instance, the size of a PyArrayObject can be obtained using the macro PyArray_SIZE(PyArrayObject *)
Finally, it's very simple and logical, once you know it :-)

Swig, returning an array of doubles

I know, there are often many ways to solve certain problems. But here I know which way I want to have it, but I am unable to make it work with Python and SWIG...
I have a C-function, which returns me an array of double values:
double *my(int x)
{
double a,b,*buf;
buf = malloc (x * sizeof(double));
a=3.14;
b=2.7;
buf[0]=a;
buf[1]=b;
return buf;
}
Here, I definitively want to have the array as a return value. Not, as in many examples a 'void' function, which writes into an input array. Now, I would like to get a SWIG-python wrapper, which might be used as:
>>> import example
>>> print example.my(7)
[3.14,2.7]
Whatever I do, I have some conceptual problems here - I always get s.th. like <Swig Object of type 'double *' at 0xFABCABA12>
I tried to define some typemaps in my swg file:
%typemap(out) double [ANY] {
int i;
$result = PyList_New($1_dim0);
for (i = 0; i < $1_dim0; i++) {
PyObject *o = PyFloat_FromDouble((double) $1[i]);
PyList_SetItem($result,i,o);
}
}
But still I am unable to get out my results as required. Does anyone have a simple code-example to achieve this task?
The first problem is that your typemap doesn't match, you'll need a %typemap(out) double * { ... } since your function returns a pointer to double and not a double array.
If your list is of fixed size (i.e. an integer literal) as in the example you gave (which I assume is not what you want) you could simply change the typemap as I gave above and exchange $1_dim0 for the fixed size.
Otherwise your problem is that your %typemap(out) double * cannot possibly know the value of your parameter int x. You could return a struct that carries both the pointer and the size. Then you can easily define a typemap to turn that into a list (or a NumPy array, see also my response to Wrap C struct with array member for access in python: SWIG? cython? ctypes?).
Incidentally it's not possible to return a fixed sized array in C (see also this answer: Declaring a C function to return an array), so a %typemap(out) double [ANY] { ... } can never match.
I suffered similar problem and solved it in following way.
// example.i
%module example
%include "carrays.i"
%array_class(float, floatArray);
float * FloatArray(int N);
float SumFloats(float * f);
# ipython
> a = example.floatArray(23) # array generated by swig's class constructor
> a
<example.floatArray; proxy of <Swig Object of type 'floatArray *' at 0x2e74180> >
> a[0]
-2.6762280573445764e-37 # unfortunately it is created uninitialized..
> b = example.FloatArray(23) # array generated by function
> b
<Swig Object of type 'float *' at 0x2e6ad80>
> b[0]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
# .....
TypeError: 'SwigPyObject' object is not subscriptable
> #But there is a way to access b!!
> p = example.floatArray_frompointer(b) # i found this function by example. and twice tab
> p
<example.floatArray; proxy of <Swig Object of type 'floatArray *' at 0x2e66750> >
> p[0]
0.0
> p[0] = 42
> p[0]
42.0
Fortunately, all of these types (float *, floatArray *, and proxy of floatArray *) may be successfully passed to C++ function (such as SumFloats).
You might want to look at the documentation around carray.i:
%include "carrays.i"
%array_class(int, intArray);
http://www.swig.org/Doc2.0/Python.html#Python_nn48
If you don't mind pulling in the numpy python module in your python code, you can do the following:
In the SWIG interface file:
%{
#define SWIG_FILE_WITH_INIT
%}
%include "numpy.i"
%init %{
import_array();
%}
%apply(float ARGOUT_ARRAY1[ANY]) {(float outarray1d[9])};
void rf(float outarray1d[9]);
Only the last two lines are specific to this example, the first stuff is default for numpy.i (see the numpy.i documentation elsewhere: http://docs.scipy.org/doc/numpy/reference/swig.interface-file.html).
In the C file (can also be inlined in .i file):
void rf(float outarray1d[9]) {
float _internal_rf[9];
/* ... */
memcpy(outarray1d, _internal_rf, 9*sizeof(float));
}
Then you have a function which you can call from python as
import mymodule
a = mymodule.rf()
# a is a numpy array of float32's, with len 9
Now, if you don't want to be forced to pull in the numpy module in your python project, then I suggest you check numpy.i to see how they do the %typemap trick -- as I understand it, it's done with SWIG typemaps and not inherently tied to numpy - should be possible to do the same trick with tuples or lists as return value.
I don't know how much C you know - so apologies if I'm teaching you to suck eggs here...
There is no Array class in plain-ole C. An array is always a pointer to a piece of memory, not a "thing" in itself and therefore cannot just be printed-out by itself.
In this case - your "buf" is of type "double *". AFAICRem, if you want to print out the actual values stored at "the memory pointed-to by buf" you have to deallocate each one eg (in pseudocode): for i = 0 to buflength print buf[i]

In Python, how to use a C++ function which returns an allocated array of structs via a ** parameter?

I'd like to use some existing C++ code, NvTriStrip, in a Python tool.
SWIG easily handles the functions with simple parameters, but the main function, GenerateStrips, is much more complicated.
What do I need to put in the SWIG interface file to indicate that primGroups is really an output parameter and that it must be cleaned up with delete[]?
///////////////////////////////////////////////////////////////////////////
// GenerateStrips()
//
// in_indices: input index list, the indices you would use to render
// in_numIndices: number of entries in in_indices
// primGroups: array of optimized/stripified PrimitiveGroups
// numGroups: number of groups returned
//
// Be sure to call delete[] on the returned primGroups to avoid leaking mem
//
bool GenerateStrips( const unsigned short* in_indices,
const unsigned int in_numIndices,
PrimitiveGroup** primGroups,
unsigned short* numGroups,
bool validateEnabled = false );
FYI, here is the PrimitiveGroup declaration:
enum PrimType
{
PT_LIST,
PT_STRIP,
PT_FAN
};
struct PrimitiveGroup
{
PrimType type;
unsigned int numIndices;
unsigned short* indices;
PrimitiveGroup() : type(PT_STRIP), numIndices(0), indices(NULL) {}
~PrimitiveGroup()
{
if(indices)
delete[] indices;
indices = NULL;
}
};
Have you looked at the documentation of SWIG regarding their "cpointer.i" and "carray.i" libraries? They're found here. That's how you have to manipulate things unless you want to create your own utility libraries to accompany the wrapped code. Here's the link to the Python handling of pointers with SWIG.
Onto your question on getting it to recognize input versus output. They've got another section in the documentation here, that describes exactly that. You lable things OUTPUT in the *.i file. So in your case you'd write:
%inline{
extern bool GenerateStrips( const unsigned short* in_dices,
const unsigned short* in_numIndices,
PrimitiveGroup** OUTPUT,
unsigned short* numGroups,
bool validated );
%}
which gives you a function that returns both the bool and the PrimitiveGroup* array as a tuple.
Does that help?
It's actually so easy to make python bindings for things directly that I don't know why people bother with confusing wrapper stuff like SWIG.
Just use Py_BuildValue once per element of the outer array, producing one tuple per row. Store those tuples in a C array. Then Call PyList_New and PyList_SetSlice to generate a list of tuples, and return the list pointer from your C function.
I don't know how to do it with SWIG, but you might want to consider moving to a more modern binding system like Pyrex or Cython.
For example, Pyrex gives you access to C++ delete for cases like this. Here's an excerpt from the documentation:
Disposal
The del statement can be applied to a pointer to a C++ struct
to deallocate it. This is equivalent to delete in C++.
cdef Shrubbery *big_sh
big_sh = new Shrubbery(42.0)
display_in_garden_show(big_sh)
del big_sh
http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/version/Doc/Manual/using_with_c++.html

Categories

Resources