Related
I have two related questions regarding creating a numpy array using the C API. Given a newly created numpy array:
std::vector<double> vec({0.1, 0.2});
int length = vec.size();
double* data = new double[length];
std::copy(vec.begin(), vec.end(), data);
PyObject* obj = PyArray_SimpleNewFromData(1, &length, NPY_DOUBLE, (void*)data);
How do I ensure proper memory management?
I didn't want to give PyArray_SimpleNewFromData a pointer to vec.size() since that's owned by vec so I copied the data into newly allocated memory. However, will numpy/python "just know" it needs deleting at end of scope? The docs mention about setting the OWNDATA flag, is this appropriate for heap-allocated memory?
How do I get a PyArrayObject* from a PyObject* returned from PyArray_SimpleNewFromData?
All the new array creation mechanisms such as PyArray_SimpleNewFromData return a PyObject*, but if you want to do anything using the numpy c api, you need a PyArrayObject*.
Edit
I was able to do an reinterpret_cast for 2, for instance:
int owned = PyArray_CHKFLAGS(reinterpret_cast<PyArrayObject *>($result), NPY_ARRAY_OWNDATA);
PyArray_ENABLEFLAGS(reinterpret_cast< PyArrayObject*>($result), NPY_ARRAY_OWNDATA);
owned = PyArray_CHKFLAGS(reinterpret_cast< PyArrayObject *>($result), NPY_ARRAY_OWNDATA);
if (!owned){
throw std::logic_error("PyArrayObject does not own its memory");
}
Still not sure if this is the "right" way to do it.
Regarding 1:
If the data ownership should be passed to the PyObject returned by PyArray_SimpleNewFromData then OWNDATA must be set. In this case the memory must be heap allocated, because stack allocated variables are released as soon as the scope of the variable is left.
PyObject contains a reference count to manage the underlying memory. When the reference count is decremented to 0 then the object is deleted (Py_INCREF,Py_DECREF).
Regarding 2:
PyArrayObject is a struct which contains the variable ob_base of type PyObject.
This PyObject is returned from PyArray_SimpleNewFromData.
Since ob_base is the first variable of PyArrayObject, reinterpret_cast can be used to obtain PyArrayObject from PyObject.
Py_TYPE can be used to check the type of a PyObject.
I have a variable number of numpy arrays, which I'd like to pass to a C function. I managed to pass each individual array (using <ndarray>.ctypes.data_as(c_void_p)), but the number of array may vary a lot.
I thought I could pass all of these "pointers" in a list and use the PyList_GetItem() function in the C code. It works like a charm, except that the values of all elements are not the pointers I usually get when they are passed as function arguments.
Though, if I have :
from numpy import array
from ctypes import py_object
a1 = array([1., 2., 3.8])
a2 = array([222.3, 33.5])
values = [a1, a2]
my_cfunc(py_object(values), c_long(len(values)))
And my C code looks like :
void my_cfunc(PyObject *values)
{
int i, n;
n = PyObject_Length(values)
for(i = 0; i < n; i++)
{
unsigned long long *pointer;
pointer = (unsigned long long *)(PyList_GetItem(values, i);
printf("value 0 : %f\n", *pointer);
}
}
The printed value are all 0.0000
I have tried a lot of different solutions, using ctypes.byref(), ctypes.pointer(), etc. But I can't seem to be able to retrieve the real pointer values. I even have the impression the values converted by c_void_p() are truncated to 32 bits...
While there are many documentations about passing numpy pointers to C, I haven't seen anything about c_types within Python list (I admit this may seem strange...).
Any clue ?
After a few hours spent reading many pages of documentation and digging in numpy include files, I've finally managed to understand exactly how it works. Since I've spent a great amount of time searching for these exact explanations, I'm providing the following text as a way to avoid anyone to waste its time.
I repeat the question :
How to transfer a list of numpy arrays, from Python to C
(I also assume you know how to compile, link and import your C module in Python)
Passing a Numpy array from Python to C is rather simple, as long as it's going to be passed as an argument in a C function. You just need to do something like this in Python
from numpy import array
from ctypes import c_long
values = array([1.0, 2.2, 3.3, 4.4, 5.5])
my_c_func(values.ctypes.data_as(c_void_p), c_long(values.size))
And the C code could look like :
void my_c_func(double *value, long size)
{
int i;
for (i = 0; i < size; i++)
printf("%ld : %.10f\n", i, values[i]);
}
That's simple... but what if I have a variable number of arrays ? Of course, I could use the techniques which parses the function's argument list (many examples in Stackoverflow), but I'd like to do something different.
I'd like to store all my arrays in a list and pass this list to the C function, and let the C code handle all the arrays.
In fact, it's extremely simple, easy et coherent... once you understand how it's done ! There is simply one very simple fact to remember :
Any member of a list/tuple/dictionary is a Python object... on the C side of the code !
You can't expect to directly pass a pointer as I initially, and wrongly, thought. Once said, it sounds very simple :-) Though, let's write some Python code :
from numpy import array
my_list = (array([1.0, 2.2, 3.3, 4.4, 5.5]),
array([2.9, 3.8. 4.7, 5.6]))
my_c_func(py_object(my_list))
Well, you don't need to change anything in the list, but you need to specify that you are passing the list as a PyObject argument.
And here is the how all this is being accessed in C.
void my_c_func(PyObject *list)
{
int i, n_arrays;
// Get the number of elements in the list
n_arrays = PyObject_Length(list);
for (i = 0; i LT n_arrays; i++)
{
PyArrayObject *elem;
double *pd;
elem = PyList_GetItem(list,
i);
pd = PyArray_DATA(elem);
printf("Value 0 : %.10f\n", *pd);
}
}
Explanation :
The list is received as a pointer to a PyObject
We get the number of array from the list by using the PyObject_Length() function.
PyList_GetItem() always return a PyObject (in fact a void *)
We retrieve the pointer to the array of data by using the PyArray_DATA() macro.
Normally, PyList_GetItem() returns a PyObject *, but, if you look in the Python.h and ndarraytypes.h, you'll find that they are both defined as (I've expanded the macros !):
typedef struct _object {
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
And the PyArrayObject... is exactly the same. Though, it's perfectly interchangeable at this level. The content of ob_type is accessible for both objects and contain everything which is needed to manipulate any generic Python object. I admit that I've used one of its member during my investigations. The struct member tp_name is the string containing the name of the object... in clear text; and believe me, it helped ! This is how I discovered what each list element was containing.
While these structures don't contain anything else, how is it that we can access the pointer of this ndarray object ? Simply using object macros... which use an extended structure, allowing the compiler to know how to access the additional object's elements, behind the ob_type pointer. The PyArray_DATA() macro is defined as :
#define PyArray_DATA(obj) ((void *)((PyArrayObject_fields *)(obj))->data)
There, it's casting the PyArayObject * as a PyArrayObject_fields * and this latest structure is simply (simplified and macros expanded !) :
typedef struct tagPyArrayObject_fields {
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
char *data;
int nd;
npy_intp *dimensions;
npy_intp *strides;
PyObject *base;
PyArray_Descr *descr;
int flags;
PyObject *weakreflist;
} PyArrayObject_fields;
As you can see, the first two element of the structure are the same as a PyObject and PyArrayObject, but additional elements can be addressed using this definition. It is tempting to directly access these elements, but it's a very bad and dangerous practice which is more than strongly discouraged. You must rather use the macros and don't bother with the details and elements in all these structures. I just thought you might be interested by some internals.
Note that all PyArrayObject macros are documented in http://docs.scipy.org/doc/numpy/reference/c-api.array.html
For instance, the size of a PyArrayObject can be obtained using the macro PyArray_SIZE(PyArrayObject *)
Finally, it's very simple and logical, once you know it :-)
I am writing a Python app that makes use of PulseAudio API. The implementation is heavily using callbacks written in Python and invoked by PulseAudio's C code.
The most information is passed into the callback by a specific structure, for instance pa_sink_info, which is defined in C as follows:
typedef struct pa_sink_info {
const char *name;
uint32_t index;
const char *description;
pa_sample_spec sample_spec;
pa_channel_map channel_map;
uint32_t owner_module;
pa_cvolume volume;
int mute;
uint32_t monitor_source;
const char *monitor_source_name;
pa_usec_t latency;
const char *driver;
pa_sink_flags_t flags;
pa_proplist *proplist;
pa_usec_t configured_latency;
pa_volume_t base_volume;
pa_sink_state_t state;
uint32_t n_volume_steps;
uint32_t card;
uint32_t n_ports;
pa_sink_port_info** ports;
pa_sink_port_info* active_port;
uint8_t n_formats;
pa_format_info **formats;
} pa_sink_info;
From this structure it's very easy to get scalar values, eg.:
self.some_proc(
struct.contents.index,
struct.contents.name,
struct.contents.description)
But I have a difficulty dealing with ports and active_port, which in Python are described as:
('n_ports', uint32_t),
('ports', POINTER(POINTER(pa_sink_port_info))),
('active_port', POINTER(pa_sink_port_info)),
Here n_ports specifies number of elements in ports, which is a pointer to array of pointers to structures of type pa_sink_port_info. Actually, I don't even know how I can convert these to Python types at all.
What is the most efficient way of converting ports into Python dictionary containing pa_sink_port_info's?
Solving this problem required careful reading of Python's ctypes reference. Once the mechanism of ctypes type translation implementation was clear, it's not so difficult to get to the desired values.
The main idea about pointers is that you use their contents attribute to get to the data the pointer points to. Another useful thing to know is that pointers can be indexed like arrays (it's not validated by the interpreter, so it's your own responsibility to make sure it is indeed an array).
For this particular PulseAudio example, we can process the ports structure member (which is a pointer to array of pointers) as follows:
port_list = []
if struct.contents.ports:
i = 0
while True:
port_ptr = struct.contents.ports[i]
# NULL pointer terminates the array
if port_ptr:
port_struct = port_ptr.contents
port_list.append(port_struct.name)
i += 1
else:
break
I am running to some problems and would like some help. I have a piece code, which is used to embed a python script. This python script contains a function which will expect to receive an array as an argument (in this case I am using numpy array within the python script).
I would like to know how can I pass an array from C to the embedded python script as an argument for the function within the script. More specifically can someone show me a simple example of this.
Really, the best answer here is probably to use numpy arrays exclusively, even from your C code. But if that's not possible, then you have the same problem as any code that shares data between C types and Python types.
In general, there are at least five options for sharing data between C and Python:
Create a Python list or other object to pass.
Define a new Python type (in your C code) to wrap and represent the array, with the same methods you'd define for a sequence object in Python (__getitem__, etc.).
Cast the pointer to the array to intptr_t, or to explicit ctypes type, or just leave it un-cast; then use ctypes on the Python side to access it.
Cast the pointer to the array to const char * and pass it as a str (or, in Py3, bytes), and use struct or ctypes on the Python side to access it.
Create an object matching the buffer protocol, and again use struct or ctypes on the Python side.
In your case, you want to use numpy.arrays in Python. So, the general cases become:
Create a numpy.array to pass.
(probably not appropriate)
Pass the pointer to the array as-is, and from Python, use ctypes to get it into a type that numpy can convert into an array.
Cast the pointer to the array to const char * and pass it as a str (or, in Py3, bytes), which is already a type that numpy can convert into an array.
Create an object matching the buffer protocol, and which again I believe numpy can convert directly.
For 1, here's how to do it with a list, just because it's a very simple example (and I already wrote it…):
PyObject *makelist(int array[], size_t size) {
PyObject *l = PyList_New(size);
for (size_t i = 0; i != size; ++i) {
PyList_SET_ITEM(l, i, PyInt_FromLong(array[i]));
}
return l;
}
And here's the numpy.array equivalent (assuming you can rely on the C array not to be deleted—see Creating arrays in the docs for more details on your options here):
PyObject *makearray(int array[], size_t size) {
npy_int dim = size;
return PyArray_SimpleNewFromData(1, &dim, (void *)array);
}
At any rate, however you do this, you will end up with something that looks like a PyObject * from C (and has a single refcount), so you can pass it as a function argument, while on the Python side it will look like a numpy.array, list, bytes, or whatever else is appropriate.
Now, how do you actually pass function arguments? Well, the sample code in Pure Embedding that you referenced in your comment shows how to do this, but doesn't really explain what's going on. There's actually more explanation in the extending docs than the embedding docs, specifically, Calling Python Functions from C. Also, keep in mind that the standard library source code is chock full of examples of this (although some of them aren't as readable as they could be, either because of optimization, or just because they haven't been updated to take advantage of new simplified C API features).
Skip the first example about getting a Python function from Python, because presumably you already have that. The second example (and the paragraph right about it) shows the easy way to do it: Creating an argument tuple with Py_BuildValue. So, let's say we want to call a function you've got stored in myfunc with the list mylist returned by that makelist function above. Here's what you do:
if (!PyCallable_Check(myfunc)) {
PyErr_SetString(PyExc_TypeError, "function is not callable?!");
return NULL;
}
PyObject *arglist = Py_BuildValue("(o)", mylist);
PyObject *result = PyObject_CallObject(myfunc, arglist);
Py_DECREF(arglist);
return result;
You can skip the callable check if you're sure you've got a valid callable object, of course. (And it's usually better to check when you first get myfunc, if appropriate, because you can give both earlier and better error feedback that way.)
If you want to actually understand what's going on, try it without Py_BuildValue. As the docs say, the second argument to [PyObject_CallObject][6] is a tuple, and PyObject_CallObject(callable_object, args) is equivalent to apply(callable_object, args), which is equivalent to callable_object(*args). So, if you wanted to call myfunc(mylist) in Python, you have to turn that into, effectively, myfunc(*(mylist,)) so you can translate it to C. You can construct a tuple like this:
PyObject *arglist = PyTuple_Pack(1, mylist);
But usually, Py_BuildValue is easier (especially if you haven't already packed everything up as Python objects), and the intention in your code is clearer (just as using PyArg_ParseTuple is simpler and clearer than using explicit tuple functions in the other direction).
So, how do you get that myfunc? Well, if you've created the function from the embedding code, just keep the pointer around. If you want it passed in from the Python code, that's exactly what the first example does. If you want to, e.g., look it up by name from a module or other context, the APIs for concrete types like PyModule and abstract types like PyMapping are pretty simple, and it's generally obvious how to convert Python code into the equivalent C code, even if the result is mostly ugly boilerplate.
Putting it all together, let's say I've got a C array of integers, and I want to import mymodule and call a function mymodule.myfunc(mylist) that returns an int. Here's a stripped-down example (not actually tested, and no error handling, but it should show all the parts):
int callModuleFunc(int array[], size_t size) {
PyObject *mymodule = PyImport_ImportModule("mymodule");
PyObject *myfunc = PyObject_GetAttrString(mymodule, "myfunc");
PyObject *mylist = PyList_New(size);
for (size_t i = 0; i != size; ++i) {
PyList_SET_ITEM(l, i, PyInt_FromLong(array[i]));
}
PyObject *arglist = Py_BuildValue("(o)", mylist);
PyObject *result = PyObject_CallObject(myfunc, arglist);
int retval = (int)PyInt_AsLong(result);
Py_DECREF(result);
Py_DECREF(arglist);
Py_DECREF(mylist);
Py_DECREF(myfunc);
Py_DECREF(mymodule);
return retval;
}
If you're using C++, you probably want to look into some kind of scope-guard/janitor/etc. to handle all those Py_DECREF calls, especially once you start doing proper error handling (which usually means early return NULL calls peppered through the function). If you're using C++11 or Boost, unique_ptr<PyObject, Py_DecRef> may be all you need.
But really, a better way to reduce all that ugly boilerplate, if you plan to do a lot of C<->Python communication, is to look at all of the familiar frameworks designed for improving extending Python—Cython, boost::python, etc. Even though you're embedding, you're effectively doing the same work as extending, so they can help in the same ways.
For that matter, some of them also have tools to help the embedding part, if you search around the docs. For example, you can write your main program in Cython, using both C code and Python code, and cython --embed. You may want to cross your fingers and/or sacrifice some chickens, but if it works, it's amazingly simple and productive. Boost isn't nearly as trivial to get started, but once you've got things together, almost everything is done in exactly the way you'd expect, and just works, and that's just as true for embedding as extending. And so on.
The Python function will need a Python object to be passed in. Since you want that Python object to be a NumPy array, you should use one of the NumPy C-API functions for creating arrays; PyArray_SimpleNewFromData() is probably a good start. It will use the buffer provided, without copying the data.
That said, it is almost always easier to write the main program in Python and use a C extension module for the C code. This approach makes it easier to let Python do the memory management, and the ctypes module together with Numpy's cpython extensions make it easy to pass a NumPy array to a C function.
I'd like to use some existing C++ code, NvTriStrip, in a Python tool.
SWIG easily handles the functions with simple parameters, but the main function, GenerateStrips, is much more complicated.
What do I need to put in the SWIG interface file to indicate that primGroups is really an output parameter and that it must be cleaned up with delete[]?
///////////////////////////////////////////////////////////////////////////
// GenerateStrips()
//
// in_indices: input index list, the indices you would use to render
// in_numIndices: number of entries in in_indices
// primGroups: array of optimized/stripified PrimitiveGroups
// numGroups: number of groups returned
//
// Be sure to call delete[] on the returned primGroups to avoid leaking mem
//
bool GenerateStrips( const unsigned short* in_indices,
const unsigned int in_numIndices,
PrimitiveGroup** primGroups,
unsigned short* numGroups,
bool validateEnabled = false );
FYI, here is the PrimitiveGroup declaration:
enum PrimType
{
PT_LIST,
PT_STRIP,
PT_FAN
};
struct PrimitiveGroup
{
PrimType type;
unsigned int numIndices;
unsigned short* indices;
PrimitiveGroup() : type(PT_STRIP), numIndices(0), indices(NULL) {}
~PrimitiveGroup()
{
if(indices)
delete[] indices;
indices = NULL;
}
};
Have you looked at the documentation of SWIG regarding their "cpointer.i" and "carray.i" libraries? They're found here. That's how you have to manipulate things unless you want to create your own utility libraries to accompany the wrapped code. Here's the link to the Python handling of pointers with SWIG.
Onto your question on getting it to recognize input versus output. They've got another section in the documentation here, that describes exactly that. You lable things OUTPUT in the *.i file. So in your case you'd write:
%inline{
extern bool GenerateStrips( const unsigned short* in_dices,
const unsigned short* in_numIndices,
PrimitiveGroup** OUTPUT,
unsigned short* numGroups,
bool validated );
%}
which gives you a function that returns both the bool and the PrimitiveGroup* array as a tuple.
Does that help?
It's actually so easy to make python bindings for things directly that I don't know why people bother with confusing wrapper stuff like SWIG.
Just use Py_BuildValue once per element of the outer array, producing one tuple per row. Store those tuples in a C array. Then Call PyList_New and PyList_SetSlice to generate a list of tuples, and return the list pointer from your C function.
I don't know how to do it with SWIG, but you might want to consider moving to a more modern binding system like Pyrex or Cython.
For example, Pyrex gives you access to C++ delete for cases like this. Here's an excerpt from the documentation:
Disposal
The del statement can be applied to a pointer to a C++ struct
to deallocate it. This is equivalent to delete in C++.
cdef Shrubbery *big_sh
big_sh = new Shrubbery(42.0)
display_in_garden_show(big_sh)
del big_sh
http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/version/Doc/Manual/using_with_c++.html