Extending python - to swig, not to swig or Cython

Extending python - to swig, not to swig or Cython - python

I found the bottleneck in my python code, played around with psycho etc. Then decided to write a c/c++ extension for performance.
With the help of swig you almost don't need to care about arguments etc. Everything works fine.
Now my question: swig creates a quite large py-file which does a lot of 'checkings' and 'PySwigObject' before calling the actual .pyd or .so code.
Does anyone of you have any experience whether there is some more performance to gain if you hand-write this file or let swig do it.

You should consider Boost.Python if you are not planning to generate bindings for other languages as well with swig.
If you have a lot of functions and classes to bind, Py++ is a great tool that automatically generates the needed code to make the bindings.
Pybindgen may also be an option, but it's a new project and less complete that Boost.Python.
Edit:
Maybe I need to be more explicit about pro and cons.
Swig:
pro: you can generate bindings for many scripting languages.
cons: I don't like the way the parser works. I don't know if the made some progress but two years ago the C++ parser was quite limited. Most of the time I had to copy/past my .h headers add some % characters and give extra hints to the swig parser.
I was also needed to deal with the Python C-API from time to time for (not so) complicated type conversions.
I'm not using it anymore.
Boost.Python:
pro:
It's a very complete library. It allows you to do almost everything that is possible with the C-API, but in C++. I never had to write C-API code with this library. I also never encountered bug due to the library. Code for bindings either works like a charm or refuse compile.
It's probably one of the best solutions currently available if you already have some C++ library to bind. But if you only have a small C function to rewrite, I would probably try with Cython.
cons: if you don't have a pre-compiled Boost.Python library you're going to use Bjam (sort of make replacement). I really hate Bjam and its syntax.
Python libraries created with B.P tend to become obese. It also takes a lot of time to compile them.
Py++ (discontinued): it's Boost.Python made easy. Py++ uses a C++ parser to read your code and then generates Boost.Python code automatically. You also have a great support from its author (no it's not me ;-) ).
cons: only the problems due to Boost.Python itself. Update: As of 2014 this project now looks discontinued.
Pybindgen:
It generates the code dealing with the C-API. You can either describe functions and classes in a Python file, or let Pybindgen read your headers and generate bindings automatically (for this it uses pygccxml, a python library wrote by the author of Py++).
cons: it's a young project, with a smaller team than Boost.Python. There are still some limitations: you cannot use multiple inheritance for your C++ classes, Callbacks (not automatically, custom callback handling code can be written, though). Translation of Python exceptions to C.
It's definitely worth a good look.
A new one:
On 2009/01/20 the author of Py++ announced a new package for interfacing C/C++ code with python. It is based on ctypes. I didn't try it already but I will! Note: this project looks discontiued, as Py++.
CFFI: I did not know the existence of this one until very recently so for now I cannot give my opinion. It looks like you can define C functions in Python strings and call them directly from the same Python module.
Cython: This is the method I'm currently using in my projects. Basically you write code in special .pyx files. Those files are compiled (translated) into C code which in turn are compiled to CPython modules.
Cython code can look like regular Python (and in fact pure Python are valid .pyx Cython files), but you can also more information like variable types. This optional typing allows Cython to generate faster C code. Code in Cython files can call both pure Python functions but also C and C++ functions (and also C++ methods).
It took me some time to think in Cython, that in the same code call C and C++ function, mix Python and C variables, and so on. But it's a very powerful language, with an active (in 2014) and friendly community.

SWIG 2.0.4 has introduced a new -builtin option that improves performance.
I did some benchmarking using an example program that does a lot of fast calls to a C++ extension.
I built the extension using boost.python, PyBindGen, SIP and SWIG with and without the -builtin option. Here are the results (average of 100 runs):
SWIG with -builtin 2.67s
SIP 2.70s
PyBindGen 2.74s
boost.python 3.07s
SWIG without -builtin 4.65s
SWIG used to be slowest. With the new -builtin option, SWIG seems to be fastest.

For sure you will always have a performance gain doing this by hand, but the gain will be very small compared to the effort required to do this. I don't have any figure to give you but I don't recommend this, because you will need to maintain the interface by hand, and this is not an option if your module is large!
You did the right thing to chose to use a scripting language because you wanted rapid development. This way you've avoided the early optimization syndrome, and now you want to optimize bottleneck parts, great! But if you do the C/python interface by hand you will fall in the early optimization syndrome for sure.
If you want something with less interface code, you can think about creating a dll from your C code, and use that library directly from python with cstruct.
Consider also Cython if you want to use only python code in your program.

Using Cython is pretty good. You can write your C extension with a Python-like syntax and have it generate C code. Boilerplate included. Since you have the code already in python, you have to do just a few changes to your bottleneck code and C code will be generated from it.
Example. hello.pyx:
cdef int hello(int a, int b):
return a + b
That generates 601 lines of boilerplate code:
/* Generated by Cython 0.10.3 on Mon Jan 19 08:24:44 2009 */
#define PY_SSIZE_T_CLEAN
#include "Python.h"
#include "structmember.h"
#ifndef PY_LONG_LONG
#define PY_LONG_LONG LONG_LONG
#endif
#ifndef DL_EXPORT
#define DL_EXPORT(t) t
#endif
#if PY_VERSION_HEX < 0x02040000
#define METH_COEXIST 0
#endif
#if PY_VERSION_HEX < 0x02050000
typedef int Py_ssize_t;
#define PY_SSIZE_T_MAX INT_MAX
#define PY_SSIZE_T_MIN INT_MIN
#define PyInt_FromSsize_t(z) PyInt_FromLong(z)
#define PyInt_AsSsize_t(o) PyInt_AsLong(o)
#define PyNumber_Index(o) PyNumber_Int(o)
#define PyIndex_Check(o) PyNumber_Check(o)
#endif
#if PY_VERSION_HEX < 0x02060000
#define Py_REFCNT(ob) (((PyObject*)(ob))->ob_refcnt)
#define Py_TYPE(ob) (((PyObject*)(ob))->ob_type)
#define Py_SIZE(ob) (((PyVarObject*)(ob))->ob_size)
#define PyVarObject_HEAD_INIT(type, size) \
PyObject_HEAD_INIT(type) size,
#define PyType_Modified(t)
typedef struct {
void *buf;
PyObject *obj;
Py_ssize_t len;
Py_ssize_t itemsize;
int readonly;
int ndim;
char *format;
Py_ssize_t *shape;
Py_ssize_t *strides;
Py_ssize_t *suboffsets;
void *internal;
} Py_buffer;
#define PyBUF_SIMPLE 0
#define PyBUF_WRITABLE 0x0001
#define PyBUF_LOCK 0x0002
#define PyBUF_FORMAT 0x0004
#define PyBUF_ND 0x0008
#define PyBUF_STRIDES (0x0010 | PyBUF_ND)
#define PyBUF_C_CONTIGUOUS (0x0020 | PyBUF_STRIDES)
#define PyBUF_F_CONTIGUOUS (0x0040 | PyBUF_STRIDES)
#define PyBUF_ANY_CONTIGUOUS (0x0080 | PyBUF_STRIDES)
#define PyBUF_INDIRECT (0x0100 | PyBUF_STRIDES)
#endif
#if PY_MAJOR_VERSION < 3
#define __Pyx_BUILTIN_MODULE_NAME "__builtin__"
#else
#define __Pyx_BUILTIN_MODULE_NAME "builtins"
#endif
#if PY_MAJOR_VERSION >= 3
#define Py_TPFLAGS_CHECKTYPES 0
#define Py_TPFLAGS_HAVE_INDEX 0
#endif
#if (PY_VERSION_HEX < 0x02060000) || (PY_MAJOR_VERSION >= 3)
#define Py_TPFLAGS_HAVE_NEWBUFFER 0
#endif
#if PY_MAJOR_VERSION >= 3
#define PyBaseString_Type PyUnicode_Type
#define PyString_Type PyBytes_Type
#define PyInt_Type PyLong_Type
#define PyInt_Check(op) PyLong_Check(op)
#define PyInt_CheckExact(op) PyLong_CheckExact(op)
#define PyInt_FromString PyLong_FromString
#define PyInt_FromUnicode PyLong_FromUnicode
#define PyInt_FromLong PyLong_FromLong
#define PyInt_FromSize_t PyLong_FromSize_t
#define PyInt_FromSsize_t PyLong_FromSsize_t
#define PyInt_AsLong PyLong_AsLong
#define PyInt_AS_LONG PyLong_AS_LONG
#define PyInt_AsSsize_t PyLong_AsSsize_t
#define PyInt_AsUnsignedLongMask PyLong_AsUnsignedLongMask
#define PyInt_AsUnsignedLongLongMask PyLong_AsUnsignedLongLongMask
#define __Pyx_PyNumber_Divide(x,y) PyNumber_TrueDivide(x,y)
#else
#define __Pyx_PyNumber_Divide(x,y) PyNumber_Divide(x,y)
#define PyBytes_Type PyString_Type
#endif
#if PY_MAJOR_VERSION >= 3
#define PyMethod_New(func, self, klass) PyInstanceMethod_New(func)
#endif
#if !defined(WIN32) && !defined(MS_WINDOWS)
#ifndef __stdcall
#define __stdcall
#endif
#ifndef __cdecl
#define __cdecl
#endif
#else
#define _USE_MATH_DEFINES
#endif
#ifdef __cplusplus
#define __PYX_EXTERN_C extern "C"
#else
#define __PYX_EXTERN_C extern
#endif
#include <math.h>
#define __PYX_HAVE_API__helloworld
#ifdef __GNUC__
#define INLINE __inline__
#elif _WIN32
#define INLINE __inline
#else
#define INLINE
#endif
typedef struct
{PyObject **p; char *s; long n;
char is_unicode; char intern; char is_identifier;}
__Pyx_StringTabEntry; /*proto*/
static int __pyx_skip_dispatch = 0;
/* Type Conversion Predeclarations */
#if PY_MAJOR_VERSION < 3
#define __Pyx_PyBytes_FromString PyString_FromString
#define __Pyx_PyBytes_AsString PyString_AsString
#else
#define __Pyx_PyBytes_FromString PyBytes_FromString
#define __Pyx_PyBytes_AsString PyBytes_AsString
#endif
#define __Pyx_PyBool_FromLong(b) ((b) ? (Py_INCREF(Py_True), Py_True) : (Py_INCREF(Py_False), Py_False))
static INLINE int __Pyx_PyObject_IsTrue(PyObject* x);
static INLINE PY_LONG_LONG __pyx_PyInt_AsLongLong(PyObject* x);
static INLINE unsigned PY_LONG_LONG __pyx_PyInt_AsUnsignedLongLong(PyObject* x);
static INLINE Py_ssize_t __pyx_PyIndex_AsSsize_t(PyObject* b);
#define __pyx_PyInt_AsLong(x) (PyInt_CheckExact(x) ? PyInt_AS_LONG(x) : PyInt_AsLong(x))
#define __pyx_PyFloat_AsDouble(x) (PyFloat_CheckExact(x) ? PyFloat_AS_DOUBLE(x) : PyFloat_AsDouble(x))
static INLINE unsigned char __pyx_PyInt_unsigned_char(PyObject* x);
static INLINE unsigned short __pyx_PyInt_unsigned_short(PyObject* x);
static INLINE char __pyx_PyInt_char(PyObject* x);
static INLINE short __pyx_PyInt_short(PyObject* x);
static INLINE int __pyx_PyInt_int(PyObject* x);
static INLINE long __pyx_PyInt_long(PyObject* x);
static INLINE signed char __pyx_PyInt_signed_char(PyObject* x);
static INLINE signed short __pyx_PyInt_signed_short(PyObject* x);
static INLINE signed int __pyx_PyInt_signed_int(PyObject* x);
static INLINE signed long __pyx_PyInt_signed_long(PyObject* x);
static INLINE long double __pyx_PyInt_long_double(PyObject* x);
#ifdef __GNUC__
/* Test for GCC > 2.95 */
#if __GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95))
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
#else /* __GNUC__ > 2 ... */
#define likely(x) (x)
#define unlikely(x) (x)
#endif /* __GNUC__ > 2 ... */
#else /* __GNUC__ */
#define likely(x) (x)
#define unlikely(x) (x)
#endif /* __GNUC__ */
static PyObject *__pyx_m;
static PyObject *__pyx_b;
static PyObject *__pyx_empty_tuple;
static int __pyx_lineno;
static int __pyx_clineno = 0;
static const char * __pyx_cfilenm= __FILE__;
static const char *__pyx_filename;
static const char **__pyx_f;
static void __Pyx_AddTraceback(const char *funcname); /*proto*/
/* Type declarations */
/* Module declarations from helloworld */
static int __pyx_f_10helloworld_hello(int, int); /*proto*/
/* Implementation of helloworld */
/* "/home/nosklo/devel/ctest/hello.pyx":1
* cdef int hello(int a, int b): # <<<<<<<<<<<<<<
* return a + b
*
*/
static int __pyx_f_10helloworld_hello(int __pyx_v_a, int __pyx_v_b) {
int __pyx_r;
/* "/home/nosklo/devel/ctest/hello.pyx":2
* cdef int hello(int a, int b):
* return a + b # <<<<<<<<<<<<<<
*
*/
__pyx_r = (__pyx_v_a + __pyx_v_b);
goto __pyx_L0;
__pyx_r = 0;
__pyx_L0:;
return __pyx_r;
}
static struct PyMethodDef __pyx_methods[] = {
{0, 0, 0, 0}
};
static void __pyx_init_filenames(void); /*proto*/
#if PY_MAJOR_VERSION >= 3
static struct PyModuleDef __pyx_moduledef = {
PyModuleDef_HEAD_INIT,
"helloworld",
0, /* m_doc */
-1, /* m_size */
__pyx_methods /* m_methods */,
NULL, /* m_reload */
NULL, /* m_traverse */
NULL, /* m_clear */
NULL /* m_free */
};
#endif
static int __Pyx_InitCachedBuiltins(void) {
return 0;
return -1;
}
static int __Pyx_InitGlobals(void) {
return 0;
return -1;
}
#if PY_MAJOR_VERSION < 3
PyMODINIT_FUNC inithelloworld(void); /*proto*/
PyMODINIT_FUNC inithelloworld(void)
#else
PyMODINIT_FUNC PyInit_helloworld(void); /*proto*/
PyMODINIT_FUNC PyInit_helloworld(void)
#endif
{
__pyx_empty_tuple = PyTuple_New(0);
if (unlikely(!__pyx_empty_tuple))
{__pyx_filename = __pyx_f[0]; __pyx_lineno = 1;
__pyx_clineno = __LINE__; goto __pyx_L1_error;}
/*--- Library function declarations ---*/
__pyx_init_filenames();
/*--- Initialize various global constants etc. ---*/
if (unlikely(__Pyx_InitGlobals() < 0))
{__pyx_filename = __pyx_f[0];
__pyx_lineno = 1;
__pyx_clineno = __LINE__;
goto __pyx_L1_error;}
/*--- Module creation code ---*/
#if PY_MAJOR_VERSION < 3
__pyx_m = Py_InitModule4("helloworld", __pyx_methods, 0, 0, PYTHON_API_VERSION);
#else
__pyx_m = PyModule_Create(&__pyx_moduledef);
#endif
if (!__pyx_m)
{__pyx_filename = __pyx_f[0];
__pyx_lineno = 1; __pyx_clineno = __LINE__;
goto __pyx_L1_error;};
#if PY_MAJOR_VERSION < 3
Py_INCREF(__pyx_m);
#endif
__pyx_b = PyImport_AddModule(__Pyx_BUILTIN_MODULE_NAME);
if (!__pyx_b)
{__pyx_filename = __pyx_f[0]; __pyx_lineno = 1;
__pyx_clineno = __LINE__; goto __pyx_L1_error;};
if (PyObject_SetAttrString(__pyx_m, "__builtins__", __pyx_b) < 0)
{__pyx_filename = __pyx_f[0]; __pyx_lineno = 1;
__pyx_clineno = __LINE__; goto __pyx_L1_error;};
/*--- Builtin init code ---*/
if (unlikely(__Pyx_InitCachedBuiltins() < 0))
{__pyx_filename = __pyx_f[0]; __pyx_lineno = 1;
__pyx_clineno = __LINE__; goto __pyx_L1_error;}
__pyx_skip_dispatch = 0;
/*--- Global init code ---*/
/*--- Function export code ---*/
/*--- Type init code ---*/
/*--- Type import code ---*/
/*--- Function import code ---*/
/*--- Execution code ---*/
/* "/home/nosklo/devel/ctest/hello.pyx":1
* cdef int hello(int a, int b): # <<<<<<<<<<<<<<
* return a + b
*
*/
#if PY_MAJOR_VERSION < 3
return;
#else
return __pyx_m;
#endif
__pyx_L1_error:;
__Pyx_AddTraceback("helloworld");
#if PY_MAJOR_VERSION >= 3
return NULL;
#endif
}
static const char *__pyx_filenames[] = {
"hello.pyx",
};
/* Runtime support code */
static void __pyx_init_filenames(void) {
__pyx_f = __pyx_filenames;
}
#include "compile.h"
#include "frameobject.h"
#include "traceback.h"
static void __Pyx_AddTraceback(const char *funcname) {
PyObject *py_srcfile = 0;
PyObject *py_funcname = 0;
PyObject *py_globals = 0;
PyObject *empty_string = 0;
PyCodeObject *py_code = 0;
PyFrameObject *py_frame = 0;
#if PY_MAJOR_VERSION < 3
py_srcfile = PyString_FromString(__pyx_filename);
#else
py_srcfile = PyUnicode_FromString(__pyx_filename);
#endif
if (!py_srcfile) goto bad;
if (__pyx_clineno) {
#if PY_MAJOR_VERSION < 3
py_funcname = PyString_FromFormat( "%s (%s:%d)", funcname,
__pyx_cfilenm, __pyx_clineno);
#else
py_funcname = PyUnicode_FromFormat( "%s (%s:%d)", funcname,
__pyx_cfilenm, __pyx_clineno);
#endif
}
else {
#if PY_MAJOR_VERSION < 3
py_funcname = PyString_FromString(funcname);
#else
py_funcname = PyUnicode_FromString(funcname);
#endif
}
if (!py_funcname) goto bad;
py_globals = PyModule_GetDict(__pyx_m);
if (!py_globals) goto bad;
#if PY_MAJOR_VERSION < 3
empty_string = PyString_FromStringAndSize("", 0);
#else
empty_string = PyBytes_FromStringAndSize("", 0);
#endif
if (!empty_string) goto bad;
py_code = PyCode_New(
0, /*int argcount,*/
#if PY_MAJOR_VERSION >= 3
0, /*int kwonlyargcount,*/
#endif
0, /*int nlocals,*/
0, /*int stacksize,*/
0, /*int flags,*/
empty_string, /*PyObject *code,*/
__pyx_empty_tuple, /*PyObject *consts,*/
__pyx_empty_tuple, /*PyObject *names,*/
__pyx_empty_tuple, /*PyObject *varnames,*/
__pyx_empty_tuple, /*PyObject *freevars,*/
__pyx_empty_tuple, /*PyObject *cellvars,*/
py_srcfile, /*PyObject *filename,*/
py_funcname, /*PyObject *name,*/
__pyx_lineno, /*int firstlineno,*/
empty_string /*PyObject *lnotab*/
);
if (!py_code) goto bad;
py_frame = PyFrame_New(
PyThreadState_GET(), /*PyThreadState *tstate,*/
py_code, /*PyCodeObject *code,*/
py_globals, /*PyObject *globals,*/
0 /*PyObject *locals*/
);
if (!py_frame) goto bad;
py_frame->f_lineno = __pyx_lineno;
PyTraceBack_Here(py_frame);
bad:
Py_XDECREF(py_srcfile);
Py_XDECREF(py_funcname);
Py_XDECREF(empty_string);
Py_XDECREF(py_code);
Py_XDECREF(py_frame);
}
/* Type Conversion Functions */
static INLINE Py_ssize_t __pyx_PyIndex_AsSsize_t(PyObject* b) {
Py_ssize_t ival;
PyObject* x = PyNumber_Index(b);
if (!x) return -1;
ival = PyInt_AsSsize_t(x);
Py_DECREF(x);
return ival;
}
static INLINE int __Pyx_PyObject_IsTrue(PyObject* x) {
if (x == Py_True) return 1;
else if (x == Py_False) return 0;
else return PyObject_IsTrue(x);
}
static INLINE PY_LONG_LONG __pyx_PyInt_AsLongLong(PyObject* x) {
if (PyInt_CheckExact(x)) {
return PyInt_AS_LONG(x);
}
else if (PyLong_CheckExact(x)) {
return PyLong_AsLongLong(x);
}
else {
PY_LONG_LONG val;
PyObject* tmp = PyNumber_Int(x); if (!tmp) return (PY_LONG_LONG)-1;
val = __pyx_PyInt_AsLongLong(tmp);
Py_DECREF(tmp);
return val;
}
}
static INLINE unsigned PY_LONG_LONG __pyx_PyInt_AsUnsignedLongLong(PyObject* x) {
if (PyInt_CheckExact(x)) {
long val = PyInt_AS_LONG(x);
if (unlikely(val < 0)) {
PyErr_SetString(PyExc_TypeError, "Negative assignment to unsigned type.");
return (unsigned PY_LONG_LONG)-1;
}
return val;
}
else if (PyLong_CheckExact(x)) {
return PyLong_AsUnsignedLongLong(x);
}
else {
PY_LONG_LONG val;
PyObject* tmp = PyNumber_Int(x); if (!tmp) return (PY_LONG_LONG)-1;
val = __pyx_PyInt_AsUnsignedLongLong(tmp);
Py_DECREF(tmp);
return val;
}
}
static INLINE unsigned char __pyx_PyInt_unsigned_char(PyObject* x) {
if (sizeof(unsigned char) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
unsigned char val = (unsigned char)long_val;
if (unlikely((val != long_val) || (long_val < 0))) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to unsigned char");
return (unsigned char)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE unsigned short __pyx_PyInt_unsigned_short(PyObject* x) {
if (sizeof(unsigned short) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
unsigned short val = (unsigned short)long_val;
if (unlikely((val != long_val) || (long_val < 0))) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to unsigned short");
return (unsigned short)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE char __pyx_PyInt_char(PyObject* x) {
if (sizeof(char) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
char val = (char)long_val;
if (unlikely((val != long_val) )) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to char");
return (char)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE short __pyx_PyInt_short(PyObject* x) {
if (sizeof(short) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
short val = (short)long_val;
if (unlikely((val != long_val) )) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to short");
return (short)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE int __pyx_PyInt_int(PyObject* x) {
if (sizeof(int) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
int val = (int)long_val;
if (unlikely((val != long_val) )) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to int");
return (int)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE long __pyx_PyInt_long(PyObject* x) {
if (sizeof(long) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
long val = (long)long_val;
if (unlikely((val != long_val) )) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to long");
return (long)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE signed char __pyx_PyInt_signed_char(PyObject* x) {
if (sizeof(signed char) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
signed char val = (signed char)long_val;
if (unlikely((val != long_val) )) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to signed char");
return (signed char)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE signed short __pyx_PyInt_signed_short(PyObject* x) {
if (sizeof(signed short) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
signed short val = (signed short)long_val;
if (unlikely((val != long_val) )) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to signed short");
return (signed short)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE signed int __pyx_PyInt_signed_int(PyObject* x) {
if (sizeof(signed int) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
signed int val = (signed int)long_val;
if (unlikely((val != long_val) )) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to signed int");
return (signed int)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE signed long __pyx_PyInt_signed_long(PyObject* x) {
if (sizeof(signed long) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
signed long val = (signed long)long_val;
if (unlikely((val != long_val) )) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to signed long");
return (signed long)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}
static INLINE long double __pyx_PyInt_long_double(PyObject* x) {
if (sizeof(long double) < sizeof(long)) {
long long_val = __pyx_PyInt_AsLong(x);
long double val = (long double)long_val;
if (unlikely((val != long_val) )) {
PyErr_SetString(PyExc_OverflowError, "value too large to convert to long double");
return (long double)-1;
}
return val;
}
else {
return __pyx_PyInt_AsLong(x);
}
}

An observation: Based on the benchmarking conducted by the pybindgen developers, there is no significant difference between boost.python and swig. I haven't done my own benchmarking to verify how much of this depends on the proper use of the boost.python functionality.
Note also that there may be a reason that pybindgen seems to be in general quite a bit faster than swig and boost.python: it may not produce as versatile a binding as the other two. For instance, exception propagation, call argument type checking, etc. I haven't had a chance to use pybindgen yet but I intend to.
Boost is in general quite big package to install, and last I saw you can't just install boost python you pretty much need the whole Boost library. As others have mentioned compilation will be slow due to heavy use of template programming, which also means typically rather cryptic error messages at compile time.
Summary: given how easy SWIG is to install and use, that it generates decent binding that is robust and versatile, and that one interface file allows your C++ DLL to be available from several other languages like LUA, C#, and Java, I would favor it over boost.python. But unless you really need multi-language support I would take a close look at PyBindGen because of its purported speed, and pay close attention to robustness and versatility of binding it generates.

There be dragons here. Don't swig, don't boost. For any complicated project the code you have to fill in yourself to make them work becomes unmanageable quickly. If it's a plain C API to your library (no classes), you can just use ctypes. It will be easy and painless, and you won't have to spend hours trawling through the documentation for these labyrinthine wrapper projects trying to find the one tiny note about the feature you need.

Since you are concerned with speed and overhead, I suggest considering PyBindGen .
I have experience using it to wrap a large internal C++ library. After trying SWIG, SIP, and Boost.Python I prefer PyBindGen for the following reasons:
A PyBindGen wrapper is pure-Python, no need to learn another file format
PyBindGen generates Python C API calls directly, there is no speed-robbing indirection layer like SWIG.
The generated C code is clean and simple to understand. I like Cython too, but trying to read its C output can be difficult at times.
STL sequence containers are supported (we use a lot of std::vector's)

If its not a big extension, boost::python might also be an option, it executes faster than swig, because you control what's happening, but it'll take longer to dev.
Anyways swig's overhead is acceptable if the amount of work within a single call is large enough. For example if you issue is that you have some medium sized logic block you want to move to C/C++, but that block is called within a tight-loop, frequently, you might have to avoid swig, but I can't really think of any real-world examples except for scripted graphics shaders.

Before giving up on your python code, have a look at ShedSkin. They claim better performance than Psyco on some code (and also state that it is still experimental).
Else, there are several choices for binding C/C++ code to python.
Boost is lengthy to compile but is really the most flexible and easy to use solution.
I have never used SWIG but compared to boost, it's not as flexible as it's generic binding framework, not a framework dedicated to python.
Next choice is Pyrex. It allows to write pseudo python code that gets compiled as a C extension.

There is an article worth reading on the topic Cython, pybind11, cffi – which tool should you choose?
Quick recap for the impatient:
Cython compiles your python to C/C++ allowing you to embed your C/C++ into python code. Uses static binding. For python programmers.
pybind11 (and boost.python) is the opposite. Bind your stuff at compile time from the C++ side. For C++ programmers.
CFFI allows you to bind the native stuff dynamically at runtime. Simple to use, but higher performance penalty.

Related

Strange memory behaviour when using Python C API

I am trying to implement a Python wrapper using the Python C API over a C++ library. I need to implement conversions so I can use objects in Python and C++. I already done that in the past but I have an error I really have a hard time with.
I have a very basic test function:
PyObject* convert_to_python() {
std::cout << "Convert to PyObject" << std::endl;
long int a = 20;
PyObject* py_a = PyInt_FromLong(a);
std::cout << "Convert to PyObject ok" << std::endl;
return py_a;
}
I call this function inside a GoogleTest macro:
TEST(Wrapper, ConvertTest) {
PyObject *py_m = convert_to_python();
}
And my output is:
Convert to PyObject
Segmentation fault (core dumped)
I also ran valgrind on it:
valgrind --tool=memcheck --track-origins=yes --leak-check=full ./my_convert
But it doesn't give me much information about it:
Invalid read of size 8
==19030== at 0x4F70A7B: PyInt_FromLong (in /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0)
==19030== by 0x541E6BF: _object* pysmud_from<float>(smu::Matrix<float, 0, 0>&) (smu_type_conversions.cpp:308)
==19030== by 0x43A144: (anonymous namespace)::Wrapper_ConvertMatrix_Test::Body() (test_wrapper.cpp:12)
==19030== by 0x43A0C6: (anonymous namespace)::Wrapper_ConvertMatrix_Test::TestBody() (test_wrapper.cpp:10)
==19030== by 0x465B4D: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2078)
==19030== by 0x460684: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2114)
==19030== by 0x444C05: testing::Test::Run() (gtest.cc:2151)
==19030== by 0x4454C9: testing::TestInfo::Run() (gtest.cc:2326)
==19030== by 0x445BEA: testing::TestCase::Run() (gtest.cc:2444)
==19030== by 0x44CF41: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:4315)
==19030== by 0x46712C: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2078)
==19030== by 0x461532: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2114)
==19030== Address 0x0 is not stack'd, malloc'd or (recently) free'd
I think this code should work but I can't get what it's wrong with what I wrote. Did I wrongly included or linked Python files and libraries ?
EDIT: Gives no errors
#include <Python.h>
PyObject* convert_long_int(long int a) {
PyObject *ret = PyInt_FromLong(a);
return ret;
}
int main(void) {
long int a = 65454984;
PyObject *pya = convert_long_int(a);
return 0;
}
If compiling with gcc -o wraptest -I/usr/include/python2.7 wraptest.c -L/usr/lib/x86_64-linux-gnu/ -lpython2.7
What does the initialization do ?

I can confirm the segmentation fault on Ubuntu 16.04 and Python 2.7, if I omit the initialization.
Looking at Embedding Python in Another Application, there's this example
#include <Python.h>
int
main(int argc, char *argv[])
{
Py_SetProgramName(argv[0]); /* optional but recommended */
Py_Initialize();
PyRun_SimpleString("from time import time,ctime\n"
"print 'Today is',ctime(time())\n");
Py_Finalize();
return 0;
}
So when I do an equivalent minimal main
int main()
{
Py_Initialize();
PyObject *p = convert_to_python();
Py_Finalize();
return 0;
}
it works without crash.
The difference between the two examples is
long int a = 20;
and
long int a = 65454984;
I guess, it has to do with PyInt_FromLong(long ival)
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object.
Maybe Python tries to access an uninitialized pointer or memory range without the initialization.
When I change the example using a = 256, it crashes. Using a = 257, it doesn't.
Looking at cpython/Objects/intobject.c:79, you can see an array of pointers
static PyIntObject *small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
which is accessed right below in PyInt_FromLong(long ival)
v = small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);
But without initialization from _PyInt_Init(void)
for (ival = -NSMALLNEGINTS; ival < NSMALLPOSINTS; ival++) {
if (!free_list && (free_list = fill_free_list()) == NULL)
return 0;
/* PyObject_New is inlined */
v = free_list;
free_list = (PyIntObject *)Py_TYPE(v);
(void)PyObject_INIT(v, &PyInt_Type);
v->ob_ival = ival;
small_ints[ival + NSMALLNEGINTS] = v;
}
these pointers are all NULL, causing the crash.

Running C extension in Python faster than plain C

I have implemented a Python extension in C and found that executing a C function inside of Python to be 2x faster than just executing the C code from a C main.
But why is this faster? I would expect the plain C to be exactly the same performance when called from Python as it is when called from C.
Here is my experiment:
Plain C compute code (simple 3for matrix-matrix multiplication)
Plain C main function that calls the mmult() function
Python extension wrapper to call the mmult() function
All timing is happening entirely within the C code
Here are my results:
Pure C - 85us
Python Extension - 36us
Heres my code:
--mmult.cpp----------
#include "mmult.h"
void mmult(int32_t a[1024],int32_t b[1024],int32_t c[1024]) {
struct timeval t1, t2;
gettimeofday(&t1, NULL);
for(int i=0; i<32; i=i+1) {
for(int j=0; j<32; j=j+1) {
int32_t result=0;
for(int k=0; k<32; k=k+1) {
result+=a[i*32+k]*b[k*32+j];
}
c[i*32+j] = result;
}
}
gettimeofday(&t2, NULL);
double elapsedTime = (t2.tv_usec - t1.tv_usec) + (t2.tv_sec - t1.tv_sec)*1000000;
printf("elapsed time: %fus\n",elapsedTime);
}
--mmult.h-------
#include <stdint.h>
void mmult(int32_t a[1024],int32_t b[1024],int32_t c[1024]);
--main.cpp------
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include "mmult.h"
int main() {
int* a = (int*)malloc(sizeof(int)*1024);
int* b = (int*)malloc(sizeof(int)*1024);
int* c = (int*)malloc(sizeof(int)*1024);
for(int i=0; i<1024; i++) {
a[i]=i+1;
b[i]=i+1;
c[i]=0;
}
struct timeval t1, t2;
gettimeofday(&t1, NULL);
mmult(a,b,c);
gettimeofday(&t2, NULL);
double elapsedTime = (t2.tv_usec - t1.tv_usec) + (t2.tv_sec - t1.tv_sec)*1000000;
printf("elapsed time: %fus\n",elapsedTime);
free(a);
free(b);
free(c);
return 0;
}
Heres how I compile main:
gcc -o main main.cpp mmult.cpp -O3
--wrapper.cpp-----
#include <Python.h>
#include <numpy/arrayobject.h>
#include "mmult.h"
static PyObject* mmult_wrapper(PyObject* self, PyObject* args) {
int32_t* a;
PyArrayObject* a_obj = NULL;
int32_t* b;
PyArrayObject* b_obj = NULL;
int32_t* c;
PyArrayObject* c_obj = NULL;
int res = PyArg_ParseTuple(args, "OOO", &a_obj, &b_obj, &c_obj);
if (!res)
return NULL;
a = (int32_t*) PyArray_DATA(a_obj);
b = (int32_t*) PyArray_DATA(b_obj);
c = (int32_t*) PyArray_DATA(c_obj);
/* call function */
mmult(a,b,c);
Py_RETURN_NONE;
}
/* define functions in module */
static PyMethodDef TheMethods[] = {
{"mmult_wrapper", mmult_wrapper, METH_VARARGS, "your c function"},
{NULL, NULL, 0, NULL}
};
static struct PyModuleDef cModPyDem = {
PyModuleDef_HEAD_INIT,
"mmult", "Some documentation",
-1,
TheMethods
};
PyMODINIT_FUNC
PyInit_c_module(void) {
PyObject* retval = PyModule_Create(&cModPyDem);
import_array();
return retval;
}
--setup.py-----
import os
import numpy
from distutils.core import setup, Extension
cur = os.path.dirname(os.path.realpath(__file__))
c_module = Extension("c_module", sources=["wrapper.cpp","mmult.cpp"],include_dirs=[cur,numpy.get_include()])
setup(ext_modules=[c_module])
--code.py-----
import c_module
import time
import numpy as np
if __name__ == "__main__":
a = np.ndarray((32,32),dtype='int32',buffer=np.linspace(1,1024,1024,dtype='int32').reshape(32,32))
b = np.ndarray((32,32),dtype='int32',buffer=np.linspace(1,1024,1024,dtype='int32').reshape(32,32))
c = np.ndarray((32,32),dtype='int32',buffer=np.zeros((32,32),dtype='int32'))
c_module.mmult_wrapper(a,b,c)
Heres how I compile the Python extension:
python3.6 setup_sw.py build_ext --inplace
UPDATE
Ive updated the mmult.cpp code to run the 3for for 1,000,000 iterations internally. This resulted in very similar times:
Pure C - 27us
Python Extension - 27us

85 microseconds is too small a delay to be measured reliably and repeatedly. For example, CPU cache effects (or context switches, or paging) may dominate the computation time (and alter it to make that timing meaningless).
(I guess you are on Linux/x86-64)
As a rule of thumb, try to have a run lasting about half a second at least, and repeat the benchmarking a few times. You could also use time(1) for measurements.
See also time(7). There are several notions of time (elapsed "real" time, monotonic time, process cpu time, thread cpu time, etc...). You could consider using clock(3) or clock_gettime(2) to measure time.
BTW, you might compile with a more recent version of GCC (in November 2017, GCC7 and in a few weeks GCC8) and you want to compile with gcc -march=native -O3 for benchmarking purposes. Try also other optimization options and tuning. You could also try another compiler, e.g. Clang/LLVM.
Look also at this answer (regarding parallelization) to a relevant question. Probably the numpy package is using (internally) similar techniques (outside of the Python GIL), so could be faster than your naive sequential matrix multiplication code in C.

Does PyString_AS_STRING work different in Windows 64 bits vs 32 bits?

I have the following function:
void py_get_var( const char** var_name, int* found, char** resultado )
{
*found = 0;
PyObject * module = PyImport_AddModule("__main__");
PyObject * dictionary = PyModule_GetDict(module);
PyObject * result = PyDict_GetItemString(dictionary, *var_name );
if( result == NULL ){
*found = 1;
*resultado = "";
return;
}
#ifdef PY3K
*resultado = PyBytes_AS_STRING( PyUnicode_AsUTF8String(result) );
#else
*resultado = PyString_AS_STRING(result);
#endif
}
which attempts to retrieve a string (it is always a string) from an embedded Python session. It works as intended on Linux, Mac, and Win32 platforms (and several versions of Python).
However, it returns an empty string on Win64. I'm using the GCC compiler on Windows.
Any idea of what can be the reason?

Cython: Inline Function not pure C

I have the following inline function for Cython
cpdef inline int c_rate2recs_2(int maxNN,int idx):
cdef int out=idx%maxNN
return out
However this translates into
/*
* return out
*
* cpdef inline int c_rate2recs_2(int maxNN,int idx): # <<<<<<<<<<<<<<
* cdef int out=idx%maxNN
* return out
*/
static PyObject *__pyx_pw_6kmc_cy_5c_rate2recs_2(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/
static CYTHON_INLINE int __pyx_f_6kmc_cy_c_rate2recs_2(int __pyx_v_maxNN, int __pyx_v_idx, CYTHON_UNUSED int __pyx_skip_dispatch) {
int __pyx_v_out;
int __pyx_r;
__Pyx_TraceDeclarations
__Pyx_RefNannyDeclarations
__Pyx_RefNannySetupContext("c_rate2recs_2", 0);
__Pyx_TraceCall("c_rate2recs_2", __pyx_f[0], 984);
/*
* return out
*
* cpdef inline int c_rate2recs_2(int maxNN,int idx): # <<<<<<<<<<<<<<
* cdef int out=idx%maxNN
* return out
*/
static PyObject *__pyx_pf_6kmc_cy_4c_rate2recs_2(CYTHON_UNUSED PyObject *__pyx_self, int __pyx_v_maxNN, int __pyx_v_idx) {
PyObject *__pyx_r = NULL;
__Pyx_TraceDeclarations
__Pyx_RefNannyDeclarations
__Pyx_RefNannySetupContext("c_rate2recs_2", 0);
__Pyx_TraceCall("c_rate2recs_2", __pyx_f[0], 984);
__Pyx_XDECREF(__pyx_r);
__pyx_t_1 = PyInt_FromLong(__pyx_f_6kmc_cy_c_rate2recs_2(__pyx_v_maxNN, __pyx_v_idx, 0)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 984; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
__Pyx_GOTREF(__pyx_t_1);
__pyx_r = __pyx_t_1;
__pyx_t_1 = 0;
goto __pyx_L0;
__pyx_r = Py_None; __Pyx_INCREF(Py_None);
goto __pyx_L0;
__pyx_L1_error:;
__Pyx_XDECREF(__pyx_t_1);
__Pyx_AddTraceback("kmc_cy.c_rate2recs_2", __pyx_clineno, __pyx_lineno, __pyx_filename);
__pyx_r = NULL;
__pyx_L0:;
__Pyx_XGIVEREF(__pyx_r);
__Pyx_TraceReturn(__pyx_r);
__Pyx_RefNannyFinishContext();
return __pyx_r;
}
As I am pretty new in the cython business, I would like to know how to get rid of most of the Python commands (cython -a flags this inline as pretty far away from pure C).

As I am pretty new in the cython business, I would like to know how to get rid of most of the python commands (cython -a flags this inline as pretty far away from pure C)
The trick is that if you can call your function nogil;
cpdef inline int c_rate2recs_2(int maxNN,int idx) nogil:
cdef int out=idx%maxNN
return out
then whatever yellow you see isn't actually generally going to Python. It could be an error-case, for example, or it could just be other types of mild checking. In the case of a cpdef, not only is a pure-C function made, a Python alias is made for calling from a Python scope. This will not affect speeds.
In this case some timings against a manually inlined loop showed no slowdowns, and removing inline did nothing, either, to the time. I imagine a case harder to optimise may show different characteristics, but the key is to profile.
Finally, speed-ups and removal of some of the error-checking can be had by using compiler directives.

SegFault when trying to write to a Numpy array created within a C Extension

I have an if clause within a for loop in which I have defined state_out beforehand with:
state_out = (PyArrayObject *) PyArray_FromDims(1,dims_new,NPY_BOOL);
And the if conditions are like this:
if (conn_ctr<sum*2){
*(state_out->data + i*state_out->strides[0]) = true;
}
else {
*(state_out->data + i*state_out->strides[0]) = false;
}
When commenting these out, state_out returns as an all-False Numpy array. There is a problem with this assignment that I fail to see. As far as I know, all within the struct PyArrayObject that are called here in this code are pointers, so after the pointer arithmetic, it should be pointing to the address I intend to write. (All if conditions in the code are built by reaching values in this manner, and I know it works, since I managed to printf input arrays' values.) Then if I want to assign a bool to one of these parts in the memory, I should assign it via *(pointer_intended) = true What am I missing?
EDIT: I have spotted that even if I don't reach those values even if I put some printf functions within:
if (conn_ctr<sum*2){
printf("True!\n");
}
else {
printf("False!\n");
}
I get a SegFault again.
Thanks a lot, an the rest of the code is here.
#include <Python.h>
#include "numpy/arrayobject.h"
#include <stdio.h>
#include <stdbool.h>
static PyObject* trace(PyObject *self, PyObject *args);
static char doc[] =
"This is the C extension for xor_masking routine. It interfaces with Python via C-Api, and calculates the"
"next state with C pointer arithmetic";
static PyMethodDef TraceMethods[] = {
{"trace", trace, METH_VARARGS, doc},
{NULL, NULL, 0, NULL}
};
PyMODINIT_FUNC
inittrace(void)
{
(void) Py_InitModule("trace", TraceMethods);
import_array();
}
static PyObject* trace(PyObject *self, PyObject *args){
PyObject *adjacency ,*mask, *state;
PyArrayObject *adjacency_arr, *mask_arr, *state_arr, *state_out;
if (!PyArg_ParseTuple(args,"OOO:trace", &adjacency, &mask, &state)) return NULL;
adjacency_arr = (PyArrayObject *)
PyArray_ContiguousFromObject(adjacency, NPY_BOOL,2,2);
if (adjacency_arr == NULL) return NULL;
mask_arr = (PyArrayObject *)
PyArray_ContiguousFromObject(mask, NPY_BOOL,2,2);
if (mask_arr == NULL) return NULL;
state_arr = (PyArrayObject *)
PyArray_ContiguousFromObject(state, NPY_BOOL,1,1);
if (state_arr == NULL) return NULL;
int dims[2], dims_new[1];
dims[0] = adjacency_arr -> dimensions[0];
dims[1] = adjacency_arr -> dimensions[1];
dims_new[0] = adjacency_arr -> dimensions[0];
if (!(dims[0]==dims[1] && mask_arr -> dimensions[0] == dims[0]
&& mask_arr -> dimensions[1] == dims[0]
&& state_arr -> dimensions[0] == dims[0]))
return NULL;
state_out = (PyArrayObject *) PyArray_FromDims(1,dims_new,NPY_BOOL);
int i,j;
for(i=0;i<dims[0];i++){
int sum = 0;
int conn_ctr = 0;
for(j=0;j<dims[1];j++){
bool adj_value = (adjacency_arr->data + i*adjacency_arr->strides[0]
+j*adjacency_arr->strides[1]);
if (*(bool *) adj_value == true){
bool mask_value = (mask_arr->data + i*mask_arr->strides[0]
+j*mask_arr->strides[1]);
bool state_value = (state_arr->data + j*state_arr->strides[0]);
if ( (*(bool *) mask_value ^ *(bool *)state_value) == true){
sum++;
}
conn_ctr++;
}
}
if (conn_ctr<sum*2){
}
else {
}
}
Py_DECREF(adjacency_arr);
Py_DECREF(mask_arr);
Py_DECREF(state_arr);
return PyArray_Return(state_out);
}

if (conn_ctr<sum*2){
*(state_out->data + i*state_out->strides[0]) = true;
}
else {
*(state_out->data + i*state_out->strides[0]) = false;
}
Here, I naively make a pointer arithmetic, state_out->data is a pointer to the beginning of data, it is defined to be a pointer of char:SciPy Doc - Python Types and C-Structures
typedef struct PyArrayObject {
PyObject_HEAD
char *data;
int nd;
npy_intp *dimensions;
npy_intp *strides;
...
} PyArrayObject;
Which a portion of I copied here. state_out->strides is a pointer to an array of length of the dimension of the array we have. This is a 1d array in this case. So when I make the pointer arithmetic (state_out->data + i*state_out->strides[0]) I certainly aim to calculate the pointer that points the ith value of the array, but I failed to give the type of the pointer, so the
I had tried :
NPY_BOOL *adj_value_ptr, *mask_value_ptr, *state_value_ptr, *state_out_ptr;
which the variables are pointing towards the values that I am interested in my for loop, and state_out_ptr is the one that I am writing to. I had thought that since I state that the
constituents of these arrays are of type NPY_BOOL, the pointers that point to the data within the array would be of type NPY_BOOL also. This fails with a SegFault when one is working with data directly manipulating the memory. This is from the fact that NPY_BOOL is an enum for an integer (as pv kindly stated in the comments.) for NumPy to use internally,.There is a C typedef npy_bool in order to use within the code for boolean values. Scipy Docs. When I introduced my pointers with the type
npy_bool *adj_value_ptr, *mask_value_ptr, *state_value_ptr, *state_out_ptr;
Segmentation fault disappeared, and I succeeded in manipulating and returning a Numpy Array.
I'm not an expert, but this solved my issue, point out if I'm wrong.
The part that has changed in the source code is:
state_out = (PyArrayObject *) PyArray_FromDims(1,dims_new,NPY_BOOL);
npy_bool *adj_value_ptr, *mask_value_ptr, *state_value_ptr, *state_out_ptr;
npy_intp i,j;
for(i=0;i<dims[0];i++){
npy_int sum = 0;
npy_int conn_ctr = 0;
for(j=0;j<dims[1];j++){
adj_value_ptr = (adjacency_arr->data + i*adjacency_arr->strides[0]
+j*adjacency_arr->strides[1]);
if (*adj_value_ptr == true){
mask_value_ptr = (mask_arr->data + i*mask_arr->strides[0]
+j*mask_arr->strides[1]);
state_value_ptr = (state_arr->data + j*state_arr->strides[0]);
if ( (*(bool *) mask_value_ptr ^ *(bool *)state_value_ptr) == true){
sum++;
}
conn_ctr++;
}
}
state_out_ptr = (state_out->data + i*state_out->strides[0]);
if (conn_ctr < sum*2){
*state_out_ptr = true;
}
else {
*state_out_ptr = false;
}
}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.