Question:
After about a week of experimenting, I'm at a fork in the road where I'll either keep trying to get my ctype implementation working, or see if there's a less complicated approach. I am mainly looking for feedback or opinions on if ctypes is the right tool for the job, or maybe there's an easier path, such as one of the options mentioned in this post. Given that I only need structure definitions and will not actually call any of the C functions exposed in the header, maybe one of the options mentioned above would be the path of least resistance. I'm hoping someone with more experience in this area can nudge me in the right direction.
Background
I have some nested structures defined in a c header file which I am trying to consume from Python. My focus is only on the structures themselves; I do not need to call any c functions defined in that header. I just need the typedefs, associated data structures, and any #defines which those structures depend on, i.e a #define indicating the size of an array, etc.
Goal
The main purpose of this Python script is to parse a C-header file and serialize it into a Python equivalent object, which I can then modify for my own use. For example, if the header defined a struct like this:
#ifndef DIAG_H
#define DIAG_H
#include <stdint.h>
typedef struct
{
uint32_t CounterA;
uint32_t CounterB;
uint32_t CounterC;
uint32_t CounterD;
uint32_t CounterE;
uint32_t WriteCount;
} CounterCfg_t;
typedef struct
{
uint16_t Id;
uint16_t Count;
uint32_t stamp;
float voltage;
CounterCfg_t countCfg;
} Entry_t;
typedef struct
{
uint8_t major;
uint8_t minor;
uint8_t patch;
uint8_t beta;
} Cfg_Version_t;
typedef struct
{
float offsetB;
float offsetA;
} Offsets_t;
typedef struct
{
Offsets_t offsets[10];
} Cal_t;
typedef struct
{
float alpha;
uint32_t crc;
Cal_t cal;
Cfg_Version_t version;
Entry_t logs[40];
} Sample_t;
#endif
I would like to be able to parse this header on the Python side and use the equivalent object like this(pseudo code):
foo = Sample_t()
foo.logs[1].countCfg.WriteCount = 50
foo.cal.offsets[2].offsetB = 0.012345
The header file will change frequently as other devs update it, so writing an object equivalent by hand is not an option, which is why I pursued the parsing option. i.e, if someone updates the header, I simply run this script instead of hand editing.
What I have tried:
I went down the path of using ctypesgen to generate ctypes based wrappers and then importing them into my script. However, things have become pretty complicated. For example, ctypesgen is amazing, and does all the hard work for me, however, I now have a 6K line wrapper that gets generated and pulled in. Looking at this auto-generated file, I just wonder if this is overkill for what I am trying to do.
To demonstrate where I'm at right now, I made a quick barebones example where I try traversing some nested structures. The example includes the ctypesgen generated wrappers, based on a simple header I provided, along with the source to walk the structure. As you can see from the code, this requires me to access protected members such as _fields_, _type_, _length_, _obj, etc, and because of that, it feels like I am doing something wrong in general. That, coupled with my lack of ctype knowledge, has brought me here, to see if there's an easier way. Or maybe a different library for my use case.
My Attempt at walking the nested structures
Thanks!
2017/06/13 EDIT:
I tried using boost as was suggested, but after spending more than 3 days trying to get it to compile and link, and failing, I decided that the stupid painful way was probably the fastest and less painfull.... so now my code just saves a mess of gigantic text files (splitting arrays and the complex/ imaginary parts of the numbers across files) that C++ then reads. Elegant... no.... effective... yes.
I have some scientific code, currently written in Python, that is being slowed down by a numerical 3d integration step within a loop. To overcome this I am re-writing this particular step in C++. (Cython etc is not an option).
Long story short: I want to transfer several very large arrays of complex numbers from the python code to the C++ integrator as conveniently and painlessly as possible. I could do this manually and painfully using text or binary files - but before I embark on this, I was wondering if I have any better options?
I'm using visual studio for C++ and anaconda for python (not my choice!)
Is there any file format or method that would make it quick and convenient to save an array of complex numbers from python and then recreate it in C++?
Many thanks,
Ben
An easy solution that I used many times is to build your "C++ side" as a dll (=shared object on Linux/OS X), provide a simple, C-like entrypoint (straight integers, pointers & co., no STL stuff) and pass the data through ctypes.
This avoids boost/SIP/Swig/... build nightmares, can be kept zero-copy (with ctypes you can pass a straight pointer to your numpy data) and allow you to do whatever you want (especially on the build-side - no friggin' distutils, no boost, no nothing - build it with whatever can build a C-like dll) on the C++ side. It has also the nice side-effect of having your C++ algorithm callable from other languages (virtually any language has some way to interface with C libraries).
Here's a quick artificial example. The C++ side is just:
extern "C" {
double sum_it(double *array, int size) {
double ret = 0.;
for(int i=0; i<size; ++i) {
ret += array[i];
}
return ret;
}
}
This has to be compiled to a dll (on Windows) or a .so (on Linux), making sure to export the sum_it function (automatic with gcc, requires a .def file with VC++).
On the Python side, we can have a wrapper like
import ctypes
import os
import sys
import numpy as np
path = os.path.dirname(__file__)
cdll = ctypes.CDLL(os.path.join(path, "summer.dll" if sys.platform.startswith("win") else "summer.so"))
_sum_it = cdll.sum_it
_sum_it.restype = ctypes.c_double
def sum_it(l):
if isinstance(l, np.ndarray) and l.dtype == np.float64 and len(l.shape)==1:
# it's already a numpy array with the right features - go zero-copy
a = l.ctypes.data
else:
# it's a list or something else - try to create a copy
arr_t = ctypes.c_double * len(l)
a = arr_t(*l)
return _sum_it(a, len(l))
which makes sure that the data is marshaled correctly; then invoking the function is as trivial as
import summer
import numpy as np
# from a list (with copy)
print summer.sum_it([1, 2, 3, 4.5])
# from a numpy array of the right type - zero-copy
print summer.sum_it(np.array([3., 4., 5.]))
See the ctypes documentation for more information on how to use it. See also the relevant documentation in numpy.
For complex numbers, the situation is slightly more complicated, as there's no builtin for it in ctypes; if we want to use std::complex<double> on the C++ side (which is pretty much guaranteed to work fine with the numpy complex layout, namely a sequence of two doubles), we can write the C++ side as:
extern "C" {
std::complex<double> sum_it_cplx(std::complex<double> *array, int size) {
std::complex<double> ret(0., 0.);
for(int i=0; i<size; ++i) {
ret += array[i];
}
return ret;
}
}
Then, on the Python side, we have to replicate the c_complex layout to retrieve the return value (or to be able to build complex arrays without numpy):
class c_complex(ctypes.Structure):
# Complex number, compatible with std::complex layout
_fields_ = [("real", ctypes.c_double), ("imag", ctypes.c_double)]
def __init__(self, pycomplex):
# Init from Python complex
self.real = pycomplex.real
self.imag = pycomplex.imag
def to_complex(self):
# Convert to Python complex
return self.real + (1.j) * self.imag
Inheriting from ctypes.Structure enables the ctypes marshalling magic, which is performed according to the _fields_ member; the constructor and extra methods are just for ease of use on the Python side.
Then, we have to tell ctypes the return type
_sum_it_cplx = cdll.sum_it_cplx
_sum_it_cplx.restype = c_complex
and finally write our wrapper, in a similar fashion to the previous one:
def sum_it_cplx(l):
if isinstance(l, np.ndarray) and l.dtype == np.complex and len(l.shape)==1:
# the numpy array layout for complexes (sequence of two double) is already
# compatible with std::complex (see https://stackoverflow.com/a/5020268/214671)
a = l.ctypes.data
else:
# otherwise, try to build our c_complex
arr_t = c_complex * len(l)
a = arr_t(*(c_complex(r) for r in l))
ret = _sum_it_cplx(a, len(l))
return ret.to_complex()
Testing it as above
# from a complex list (with copy)
print summer.sum_it_cplx([1. + 0.j, 0 + 1.j, 2 + 2.j])
# from a numpy array of the right type - zero-copy
print summer.sum_it_cplx(np.array([1. + 0.j, 0 + 1.j, 2 + 2.j]))
yields the expected results:
(3+3j)
(3+3j)
I see the OP is over a year old now, but I recently addressed a similar problem using the native Python-C/C++ API and its Numpy-C/C++ extension, and since I personally don't enjoy using ctypes for various reasons (e.g., complex number workarounds, messy code), nor Boost, wanted to post my answer for future searchers.
Documentation for the Python-C API and Numpy-C API are both quite extensive (albeit a little overwhelming at first). But after one or two successes, writing native C/C++ extensions becomes very easy.
Here is an example C++ function that can be called from Python. It integrates a 3D numpy array of either real or complex (numpy.double or numpy.cdouble) type. The function will be imported through a DLL (.so) via the module cintegrate.so.
#include "Python.h"
#include "numpy/arrayobject.h"
#include <math.h>
static PyObject * integrate3(PyObject * module, PyObject * args)
{
PyObject * argy=NULL; // Regular Python/C API
PyArrayObject * yarr=NULL; // Extended Numpy/C API
double dx,dy,dz;
// "O" format -> read argument as a PyObject type into argy (Python/C API)
if (!PyArg_ParseTuple(args, "Oddd", &argy,&dx,&dy,&dz)
{
PyErr_SetString(PyExc_ValueError, "Error parsing arguments.");
return NULL;
}
// Determine if it's a complex number array (Numpy/C API)
int DTYPE = PyArray_ObjectType(argy, NPY_FLOAT);
int iscomplex = PyTypeNum_ISCOMPLEX(DTYPE);
// parse python object into numpy array (Numpy/C API)
yarr = (PyArrayObject *)PyArray_FROM_OTF(argy, DTYPE, NPY_ARRAY_IN_ARRAY);
if (yarr==NULL) {
Py_INCREF(Py_None);
return Py_None;
}
//just assume this for 3 dimensional array...you can generalize to N dims
if (PyArray_NDIM(yarr) != 3) {
Py_CLEAR(yarr);
PyErr_SetString(PyExc_ValueError, "Expected 3 dimensional integrand");
return NULL;
}
npy_intp * dims = PyArray_DIMS(yarr);
npy_intp i,j,k,m;
double * p;
//initialize variable to hold result
Py_complex result = {.real = 0, .imag = 0};
if (iscomplex) {
for (i=0;i<dims[0];i++)
for (j=0;j<dims[1];j++)
for (k=0;k<dims[1];k++) {
p = (double*)PyArray_GETPTR3(yarr, i,j,k);
result.real += *p;
result.imag += *(p+1);
}
} else {
for (i=0;i<dims[0];i++)
for (j=0;j<dims[1];j++)
for (k=0;k<dims[1];k++) {
p = (double*)PyArray_GETPTR3(yarr, i,j,k);
result.real += *p;
}
}
//multiply by step size
result.real *= (dx*dy*dz);
result.imag *= (dx*dy*dz);
Py_CLEAR(yarr);
//copy result into returnable type with new reference
if (iscomplex) {
return Py_BuildValue("D", &result);
} else {
return Py_BuildValue("d", result.real);
}
};
Simply put that into a source file (we'll call it cintegrate.cxx along with the module definition stuff, inserted at the bottom:
static PyMethodDef cintegrate_Methods[] = {
{"integrate3", integrate3, METH_VARARGS,
"Pass 3D numpy array (double or complex) and dx,dy,dz step size. Returns Reimman integral"},
{NULL, NULL, 0, NULL} /* Sentinel */
};
static struct PyModuleDef module = {
PyModuleDef_HEAD_INIT,
"cintegrate", /* name of module */
NULL, /* module documentation, may be NULL */
-1, /* size of per-interpreter state of the module,
or -1 if the module keeps state in global variables. */
cintegrate_Methods
};
Then compile that via setup.py much like Walter's boost example with just a couple obvious changes- replacing file.cc there with our file cintegrate.cxx, removing boost dependencies, and making sure the path to "numpy/arrayobject.h" is included.
In python then you can use it like:
import cintegrate
import numpy as np
arr = np.random.randn(4,8,16) + 1j*np.random.randn(4,8,16)
# arbitrary step size dx = 1., y=0.5, dz = 0.25
ans = cintegrate.integrate3(arr, 1.0, 0.5, .25)
This specific code hasn't been tested but is mostly copied from working code.
Note added in edit.
As mentioned in the comments, python itself, being an interpreted language, has little potential for computational efficiency. So in order to make python scripts efficient, one must use modules which aren't all interpreted, but under the hood call compiled (and optimized) code written in, say, C/C++. This is exactly what numpy does for you, in particular for operations on whole arrays.
Therefore, the first step towards efficient python scripts is the usage of numpy. Only the second step is to try to use your own compiled (and optimized) code. Therefore, I have assumed in my example below that you were using numpy to store the array of complex numbers. Everything else would be ill-advised.
There are various ways in which you can access python's original data from within a C/C++ program. I personally have done this with boost.Python, but must warn you that the documentation and support are lousy at best: you're pretty much on your own (and stack overflow, of course).
For example your C++ file may look like this
// file.cc
#include <boost/python.hpp>
#include <boost/python/numpy.hpp>
namespace p = boost::python;
namespace n = p::numpy;
n::ndarray func(const n::ndarray&input, double control_variable)
{
/*
your code here, see documentation for boost python
you pass almost any python variable, doesn't have to be numpy stuff
*/
}
BOOST_PYTHON_MODULE(module_name)
{
Py_Initialize();
n::initialize(); // only needed if you use numpy in the interface
p::def("function", func, "doc-string");
}
to compile this, you may use a python script such as
# setup.py
from distutils.core import setup
from distutils.extension import Extension
module_name = Extension(
'module_name',
extra_compile_args=['-std=c++11','-stdlib=libc++','-I/some/path/','-march=native'],
extra_link_args=['-stdlib=libc++'],
sources=['file.cc'],
libraries=['boost_python','boost_numpy'])
setup(
name='module_name',
version='0.1',
ext_modules=[module_name])
and run it as python setup.py build, which will create an appropriate .so file in a sub-directory of build, which you can import from python.
For a non template class I would write something like that
But I don't know what should I do if my class is a template class.
I've tried something like that and it's not working.
extern "C" {
Demodulator<double>* Foo_new_double(){ return new Demodulator<double>(); }
Demodulator<float>* Foo_new_float(){ return new Demodulator<float>(); }
void demodulateDoubleMatrix(Demodulator<double>* demodulator, double * input, int rows, int columns){ demodulator->demodulateMatrixPy(input, rows, columns) }
}
Note: Your question contradicts the code partially, so I'm ignoring the code for now.
C++ templates are an elaborated macro mechanism that gets resolved at compile time. In other words, the binary only contains the code from template instantiations (which is what you get when you apply parameters, typically types, to the the template), and those are all that you can export from a binary to other languages. Exporting them is like exporting any regular type, see for example std::string.
Since the templates themselves don't survive compilation, there is no way that you can export them from a binary, not to C, not to Python, not even to C++! For the latter, you can provide the templates themselves though, but that doesn't include them in a binary.
Two assumptions:
Exporting/importing works via binaries. Of course, you could write an import that parses C++.
C++ specifies (or specified?) export templates, but as far as I know, this isn't really implemented in the wild, so I left that option out.
The C++ language started as a superset of C: That is, it contains new keywords, syntax and capabilities that C does not provide. C does not have the concept of a class, has no concept of a member function and does not support the concept of access restrictions. C also does not support inheritance. The really big difference, however, is templates. C has macros, and that's it.
Therefore no, you can't directly expose C++ code to C in any fashion, you will have to use C-style code in your C++ to expose the C++ layer.
template<T> T foo(T i) { /* ... */ }
extern "C" int fooInt(int i) { return foo(i); }
However C++ was originally basically a C code generator, and C++ can still interop (one way) with the C ABI: member functions are actually implemented by turning this->function(int arg); into ThisClass0int1(this, arg); or something like that. In theory, you could write something to do this to your code, perhaps leveraging clang.
But that's a non-trivial task, something that's already well-tackled by SWIG, Boost::Python and Cython.
The problem with templates, however, is that the compiler ignores them until you "instantiate" (use) them. std::vector<> is not a concrete thing until you specify std::vector<int> or something. And now the only concrete implementation of that is std::vector<int>. Until you've specified it somewhere, std::vector<string> doesn't exist in your binary.
You probably want to start by looking at something like this http://kos.gd/2013/01/5-ways-to-use-python-with-native-code/, select a tool, e.g. SWIG, and then start building an interface to expose what you want/need to C. This is a lot less work than building the wrappers yourself. Depending which tool you use, it may be as simple as writing a line saying using std::vector<int> or typedef std::vector<int> IntVector or something.
---- EDIT ----
The problem with a template class is that you are creating an entire type that C can't understand, consider:
template<typename T>
class Foo {
T a;
int b;
T c;
public:
Foo(T a_) : a(a_) {}
void DoThing();
T GetA() { return a; }
int GetB() { return b; }
T GetC() { return c; }
};
The C language doesn't support the class keyword, never mind understand that members a, b and c are private, or what a constructor is, and C doesn't understand member functions.
Again it doesn't understand templates so you'll need to do what C++ does automatically and generate an instantiation, by hand:
struct FooDouble {
double a;
int b;
double c;
};
Except, all of those variables are private. So do you really want to be exposing them? If not, you probably just need to typedef "FooDouble" to something the same size as Foo and make a macro to do that.
Then you need to write replacements for the member functions. C doesn't understand constructors, so you will need to write a
extern "C" FooDouble* FooDouble_construct(double a);
FooDouble* FooDouble_construct(double a) {
Foo* foo = new Foo(a);
return reinterept_cast<FooDouble*>(foo);
}
and a destructor
extern "C" void FooDouble_destruct(FooDouble* foo);
void FooDouble_destruct(FooDouble* foo) {
delete reinterpret_cast<Foo*>(foo);
}
and a similar pass-thru for the accessors.
I have scenario where I need pass around opaque void* pointers through my C++ <-> Python interface implemented based on SWIG (ver 1.3). I am able to return and accept void* in regular functions like this:
void* foo();
void goo( void* );
The problems begins when I try to do the same using generic functions (and this is what I actually need to do). I was able to deal with "foo" scenario above with the function doing essentially something like this (stolen from code generated by SWIG itself for function foo above):
PyObject*
foo(...)
{
if( need to return void* )
return SWIG_NewPointerObj(SWIG_as_voidptr(ptr), SWIGTYPE_p_void, 0 | 0 );
}
I am not convinced this is the best way to do this, but it works. The "goo" scenario is much more troublesome. Here I need to process some generic input and while I can implement conversion logic following the example in the code generated by the SWIG itself for function goo above:
void
goo( PyObject* o )
{
...
if( o is actually void* ) { <==== <1>
void* ptr;
int res = SWIG_ConvertPtr( o, SWIG_as_voidptrptr(&ptr), 0, 0);
if( SWIG_IsOK(res) ) do_goo( ptr );
}
...
}
I do not have any way to implement condition in the line <1>. While for other types I can use functions like: PyInt_Check, PyString_Check etc, for void* I do not have the option to do so. On Python side it is represented by object with type PySwigObject and while it definitely knows that it wraps void* (I can see it if I print the value) I do not know how to access this information from PyObject*.
I would appreciate any help figuring out any way to deal with this problem.
Thank you,
Am I right in thinking you essentially want some way of testing whether the incoming python object is a 'void*'? Presumably this is the same as testing if it is a pointer to anything?
If so, what about using a boost::shared_ptr (or other pointer container) to hold your generic object? This would require some C++ code rewriting though, so might not be feasible.
We can extract a PyObject pointing to a python method using
PyObject *method = PyDict_GetItemString(methodsDictionary,methodName.c_str());
I want to know how many arguments the method takes. So if the function is
def f(x,y):
return x+y
how do I find out it needs 2 arguments?
Followed through the link provided by Jon. Assuming you don't want to (or can't) use Boost in your application, the following should get you the number (easily adapted from How to find the number of parameters to a Python function from C?):
PyObject *key, *value;
int pos = 0;
while(PyDict_Next(methodsDictionary, &pos, &key, &value)) {
if(PyCallable_Check(value)) {
PyObject* fc = PyObject_GetAttrString(value, "func_code");
if(fc) {
PyObject* ac = PyObject_GetAttrString(fc, "co_argcount");
if(ac) {
const int count = PyInt_AsLong(ac);
// we now have the argument count, do something with this function
Py_DECREF(ac);
}
Py_DECREF(fc);
}
}
}
If you're in Python 2.x the above definitely works. In Python 3.0+, you seem to need to use "__code__" instead of "func_code" in the snippet above.
I appreciate the inability to use Boost (my company won't allow it for the projects I've been working on lately), but in general, I would try to knuckle down and use that if you can, as I've found that the Python C API can in general get a bit fiddly as you try to do complicated stuff like this.