Calling Cython(C) functions with python functions as argument - python

I would like to integrate locally defined python functions with the gsl libraries.
To do that, i have implemented the following code with Cython (example with Gauss-Legendre quadrature) :
pxd file :
cdef extern from "gsl/gsl_math.h":
ctypedef struct gsl_function:
double (* function) (double x, void * params) nogil
void * params
cdef extern from "gsl/gsl_integration.h":
gsl_integration_glfixed_table * gsl_integration_glfixed_table_alloc(size_t n) nogil
double gsl_integration_glfixed(gsl_function *f, double a, double b, gsl_integration_glfixed_table * t) nogil
void gsl_integration_glfixed_table_free(gsl_integration_glfixed_table *t) nogil
cdef double int_gsl_GaussLegendre(double func(double, void *) nogil, void * p, double xmin, double xmax) nogil
and the pyx file :
cdef size_t size_GL=1000
cdef double int_gsl_GaussLegendre(double func(double, void *) nogil, void * p, double xmin, double xmax) nogil:
cdef double result, error;
cdef gsl_integration_glfixed_table * W
cdef gsl_function F
W = gsl_integration_glfixed_table_alloc(size_GL)
F.function = func
F.params = p
result = gsl_integration_glfixed(&F, xmin, xmax, W)
gsl_integration_glfixed_table_free(W)
return result
This code work for any C function declared within my Cython code. Of course, this will fail when i pass as argument a python function.
My script in python :
def gsl_integral(py_func, xmin, xmax, args=()):
cdef size_t sizep = <int>(args.size)
cdef double[:] params = np.empty(sizep, dtype=np.double)
for i in range(0,sizep):
params[i]=args[i]
cdef gsl_function F
F.function = py_func
F.params = params
which return : "Cannot convert Python object to 'double (*)(double, void *) nogil'"
If i use instead :
def gsl_integral(py_func, xmin, xmax, args=()):
cdef size_t sizep = <int>(args.size)
cdef double[:] params = np.empty(sizep, dtype=np.double)
for i in range(0,sizep):
params[i]=args[i]
cdef gsl_function F
F.function = <double *>py_func
F.params = params
which return: "Python objects cannot be cast to pointers of primitive types"
I have seen that i could wrap my cython function into a class (Pass c function as argument to python function in cython), but i'm not quite sure to understand how to do it in this situation (plus the example doesn't work me.) As a workaround, i have been passing to Cython two array, x and f(x), the latter estimated with my local python f, in order to define a gsl spline that i can latter integrate, but this not elegant at all.
Is there any other ways?
I would like to use GSL integration without the GIL
Many thanks,
Romain

There's a few fundamental issues with your basic premise:
C function pointers do not have any state. Python callable objects do have state. Therefore, C function pointers do not have space to store the information required to call a Python function.
If you're calling a Python callable object, you cannot release the GIL, since you must have the GIL to execute that object.
This answer outlines your options (the C++ std::function option doesn't apply here). Essentially you can either
use ctypes to generate a function pointer from the Python function (it does this with hacky code-generation at runtime - it isn't possible in standard C),
or write a cdef function with an appropriate signature to pass as the function pointer, and pass your Python callable to that as part of the void* params.
I recommend the latter, but neither is a great options.

Related

Call c++ template function from cython

Currently, I am learning how to call c++ template function from Cython. I have a .h file named 'cos_doubles.h' The file is as follows:
#ifndef _COS_DOUBLES_H
#define _COS_DOUBLES_H
#include <math.h>
template <typename T, int ACCURACY>
void cos_doubles(T * in_array, T * out_array, int size)
{
int i;
for(i=0;i<size;i++){
out_array[i] = in_array[i] * 2;
}
}
#endif
Indeed the variable ACCURACY does nothing. Now, I want to define a template function in cython that use this cos_doubles function, but only has the typename T as a template. In other words, I want to give the variable ACCURACY a value in my cython code. My .pyx code is some thing like the following
# import both numpy and the Cython declarations for numpy
import numpy as np
cimport numpy as np
cimport cython
# if you want to use the Numpy-C-API from Cython
# (not strictly necessary for this example)
np.import_array()
# cdefine the signature of our c function
cdef extern from "cos_doubles.h":
void cos_doubles[T](T* in_array, T* out_array, int size)
I know this code has errors, because I did not define the variable of ACCURACY in void cos_doubles[T](T* in_array, T* out_array, int size). But I do not know the gramma how to set it. For example, I want to let ACCURACY = 4. Can anyone tell me how to do this ?
One solution I already have is
cdef void cos_doubles1 "cos_doubles<double, 4>"(double * in_array, double * out_array, int size)
cdef void cos_doubles2 "cos_doubles<int, 4>"(int * in_array, int * out_array, int size)
but I do not define two different functions. Is there any better solution?

How to use gsl in cython

I am trying to write a cython function that can be called from a python script which uses c gsl library to calculate the spearman correlation and the respective p value using a t distribution. My unsuccessful .pyx file is as follows:
import numpy as np
import numpy as np
def spearmanr(cdef double v1, cdef double v2, cdef int N):
cdef extern from "gsl/gsl_statistics_double.h":
double gsl_stats_spearman(double data1[],size_t stride1,double data2[],size_t stride2, size_t n)
cdef int strides = 1
cdef int n = N
cdef double r = gsl_stats_spearmanr(v1,strides,v2,strides,n)
cdef double tstat=r*((n-2)/(1-r**2))**0.5
cdef extern from "gsl/gsl_ranhist.h":
double gsl_cdf_tdist_Q(double x, double nu)
cdef double nu = N %% Do I need to Type Cast?
cdef double pval=gsl_cdf_tdist_Q(stat,nu)
return r,pal
when I try to compile this I get the following error:
running build_ext
cythoning spear_coxen.pyx to spear_coxen.c
Error compiling Cython file:
------------------------------------------------------------
...
import numpy as np
cimport numpy as np
def spermanr(cdef double v1, cdef double v2, cdef int N):
^
------------------------------------------------------------
spear_coxen.pyx:4:13: Expected an identifier, found 'cdef'
Error compiling Cython file:
------------------------------------------------------------
...
import numpy as np
cimport numpy as np
def spermanr(cdef double v1, cdef double v2, cdef int N):
^
------------------------------------------------------------
spear_coxen.pyx:4:25: Expected ')', found 'v1'
building 'spear_r' extension
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/Anaconda2-2.5.0/envs/python27/include/python2.7 -c spear_coxen.c -o build/temp.linux-x86_64-2.7/spear_coxen.o
spear_coxen.c:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation.
error: command 'gcc' failed with exit status 1
I am really not sure if any of this is correct syntax as I could not find as searches have not yielded anything but trivial examples. If anybody could offer assistance it should be much appreciated. Thank You
You don't need to put cdef in your function arguments. I.e.
def spearmanr(cdef double v1, cdef double v2, cdef int N):
Should just be:
cpdef spearmanr(double v1, double v2, int N):
And move your GSL declarations above that function declaration, not inside of it. Then it can access the functions you've declared, e.g.
cdef extern from "gsl/gsl_statistics_double.h":
double gsl_stats_spearman(double data1[],size_t stride1,double data2[],size_t stride2, size_t n)
Do the same with the other extern.
Also your declarations look incorrect (I'm not looking at the GSL docs) but double data1[] if you mean this to be an array it should be double[:,:] data1 or a vector double[:] data1 for a NumPy memoryview. Let me know if it doesn't compile after doing that for data1 and data2 variables declared as such...

Cython fails to recognize overloaded constructor

I'm trying to compile the cyrand project using cython, but am running into a bizarre compile error when testing overloaded constructors. See this gist for the files in question.
From the gist, I can compile and run example.pyx just fine, which uses the default constructor:
import numpy as np
cimport numpy as np
cimport cython
include "random.pyx"
#cython.boundscheck(False)
def example(n):
cdef int N = n
cdef rng r
cdef rng_sampler[double] * rng_p = new rng_sampler[double](r)
cdef rng_sampler[double] rng = deref(rng_p)
cdef np.ndarray[np.double_t, ndim=1] result = np.empty(N, dtype=np.double)
for i in range(N):
result[i] = rng.normal(0.0, 2.0)
print result
return result
^ this works and runs fine. An example run produces the following output:
$ python test_example.py
[ 0.47237842 3.153744849 3.6854932057 ]
Yet when I try to compile and run the test which used a constructor that takes an unsigned long as argument:
import numpy as np
cimport numpy as np
cimport cython
include "random.pyx"
#cython.boundscheck(False)
def example_seed(n, seed):
cdef int N = n
cdef unsigned long Seed = seed
cdef rng r
cdef rng_sampler[double] * rng_p = new rng_sampler[double](Seed)
cdef rng_sampler[double] rng = deref(rng_p)
cdef np.ndarray[np.double_t, ndim=1] result = np.empty(N, dtype=np.double)
for i in range(N):
result[i] = rng.normal(0.0, 2.0)
print result
return result
I get the following cython compiler error:
Error compiling Cython file:
-----------------------------------------------------------
...
cdef int N = n
cdef unsigned long Seed = seed
cdef rng_sampler[double] * rng_p = new rng_sampler[double](Seed)
----------------------------------------------------------
example/example_seed.pyx:15:67 Cannot assign type 'unsigned long' to 'mt19937'
I interpret this message, along with the fact that example.pyx compiles and produces a working example.so file, that cython cannot find (or manage) the rng_sampler constructor that takes an unsigned long as input. I've not used cython before, and my cpp is middling at best. Can anyone shed light on how to fix this simple problem?
python: 2.7.10 (Anaconda 2.0.1)
cython: 0.22.1
I resolved the error, it was to do with how boost is installed. I had installed boost via apt-get. After downloading/untarring and changing the pointers to boost in setup.py, it works.

cython vs python different results within scipy.optimize.fsolve

I cythonized a function that I call a bunch of times in my code. The cython version and the original python code give me the same answers (within 1e-7 which I understand has something to do with cython vs. python types...not the question here but might be important).
I attempt to find the root of the function using scipy.optimize.fsolve(). The python version works fine, but the cython version diverges.
The code is pretty involved and has a big external file to prepare some of the arguments, so I can't post everything. I post the cython code. Full code is here.
def euler_outside(float b_prime, int index_b,
np.ndarray[np.double_t, ndim=1] b_grid, int index_y,
np.ndarray[np.double_t, ndim=1] y_grid,
np.ndarray[np.double_t, ndim=1] y_vec,
np.ndarray[np.double_t, ndim=2] pol_mat_b, float q,
np.ndarray[np.double_t, ndim=2] pol_mat_q,
np.ndarray[np.double_t, ndim=2] P, float beta,
int n_ygrid, int check=0):
'''
b_prime - the variable of interest. want to find b_prime that solves this
function
'''
cdef double b, y, c, uc, e_ucp, eul_val
cdef int i
cdef np.ndarray[np.float64_t, ndim=1] uct, c_prime = np.zeros((n_ygrid,))
b = b_grid[index_b]
y = y_grid[index_y]
# Get value of consumption today
c = b + y - b_prime/q
# Get possible values of consumption tomorrow
if check:
c_prime = b_prime + y_vec - b_grid[0]/q
else:
for i in range(n_ygrid):
c_prime[i] = (b_prime + y_vec[i] -
(np.interp(b_prime, b_grid, pol_mat_b[:,i]) /
np.interp(b_prime, b_grid, pol_mat_q[:,i])))
if c<0:
return 1e10
uc = utility_prime(c)
uct = utility_prime(c_prime)
e_ucp = np.inner( uct, P[index_y,:] )
eul_val = uc - beta*q * e_ucp
return eul_val
The python code is the same but w/out the cdef statements and type info on the arguments. I've checked to make sure the output is the same for the same input values, and it is. My question is why scipy's fsolve goes off the deep-end for one and not the other. I assume it's a problem with my cython?
Running python 2.7 from Anaconda. Compiling the extension module via pyximport.
As mentioned in the comments above, the reason for the discrepancy between the results from the Python and Cython versions is that in the Cython function, several of the inputs are declared as float, whereas the actual Python variables are double precision.
The resulting increase in round-off error for the Cython function seems to be the reason why fsolve fails to converge - when these inputs are declared as double instead, the Python and Cython versions yield the exact same result, and fsolve converges correctly for both.
As an aside, cases where round-off error in the objective function prevents convergence are indicative of ill-conditioned problems. You might want to think about whether it's possible to re-formulate your model in order to improve its numerical stability.

Force NumPy ndarray to take ownership of its memory in Cython

Following this answer to "Can I force a numpy ndarray to take ownership of its memory?" I attempted to use the Python C API function PyArray_ENABLEFLAGS through Cython's NumPy wrapper and found it is not exposed.
The following attempt to expose it manually (this is just a minimum example reproducing the failure)
from libc.stdlib cimport malloc
import numpy as np
cimport numpy as np
np.import_array()
ctypedef np.int32_t DTYPE_t
cdef extern from "numpy/ndarraytypes.h":
void PyArray_ENABLEFLAGS(np.PyArrayObject *arr, int flags)
def test():
cdef int N = 1000
cdef DTYPE_t *data = <DTYPE_t *>malloc(N * sizeof(DTYPE_t))
cdef np.ndarray[DTYPE_t, ndim=1] arr = np.PyArray_SimpleNewFromData(1, &N, np.NPY_INT32, data)
PyArray_ENABLEFLAGS(arr, np.NPY_ARRAY_OWNDATA)
fails with a compile error:
Error compiling Cython file:
------------------------------------------------------------
...
def test():
cdef int N = 1000
cdef DTYPE_t *data = <DTYPE_t *>malloc(N * sizeof(DTYPE_t))
cdef np.ndarray[DTYPE_t, ndim=1] arr = np.PyArray_SimpleNewFromData(1, &N, np.NPY_INT32, data)
PyArray_ENABLEFLAGS(arr, np.NPY_ARRAY_OWNDATA)
^
------------------------------------------------------------
/tmp/test.pyx:19:27: Cannot convert Python object to 'PyArrayObject *'
My question: Is this the right approach to take in this case? If so, what am I doing wrong? If not, how do I force NumPy to take ownership in Cython, without going down to a C extension module?
You just have some minor errors in the interface definition. The following worked for me:
from libc.stdlib cimport malloc
import numpy as np
cimport numpy as np
np.import_array()
ctypedef np.int32_t DTYPE_t
cdef extern from "numpy/arrayobject.h":
void PyArray_ENABLEFLAGS(np.ndarray arr, int flags)
cdef data_to_numpy_array_with_spec(void * ptr, np.npy_intp N, int t):
cdef np.ndarray[DTYPE_t, ndim=1] arr = np.PyArray_SimpleNewFromData(1, &N, t, ptr)
PyArray_ENABLEFLAGS(arr, np.NPY_OWNDATA)
return arr
def test():
N = 1000
cdef DTYPE_t *data = <DTYPE_t *>malloc(N * sizeof(DTYPE_t))
arr = data_to_numpy_array_with_spec(data, N, np.NPY_INT32)
return arr
This is my setup.py file:
from distutils.core import setup, Extension
from Cython.Distutils import build_ext
ext_modules = [Extension("_owndata", ["owndata.pyx"])]
setup(cmdclass={'build_ext': build_ext}, ext_modules=ext_modules)
Build with python setup.py build_ext --inplace. Then verify that the data is actually owned:
import _owndata
arr = _owndata.test()
print arr.flags
Among others, you should see OWNDATA : True.
And yes, this is definitely the right way to deal with this, since numpy.pxd does exactly the same thing to export all the other functions to Cython.
#Stefan's solution works for most scenarios, but is somewhat fragile. Numpy uses PyDataMem_NEW/PyDataMem_FREE for memory-management and it is an implementation detail, that these calls are mapped to the usual malloc/free + some memory tracing (I don't know which effect Stefan's solution has on the memory tracing, at least it seems not to crash).
There are also more esoteric cases possible, in which free from numpy-library doesn't use the same memory-allocator as malloc in the cython code (linked against different run-times for example as in this github-issue or this SO-post).
The right tool to pass/manage the ownership of the data is PyArray_SetBaseObject.
First we need a python-object, which is responsible for freeing the memory. I'm using a self-made cdef-class here (mostly because of logging/demostration), but there are obviously other possiblities as well:
%%cython
from libc.stdlib cimport free
cdef class MemoryNanny:
cdef void* ptr # set to NULL by "constructor"
def __dealloc__(self):
print("freeing ptr=", <unsigned long long>(self.ptr)) #just for debugging
free(self.ptr)
#staticmethod
cdef create(void* ptr):
cdef MemoryNanny result = MemoryNanny()
result.ptr = ptr
print("nanny for ptr=", <unsigned long long>(result.ptr)) #just for debugging
return result
...
Now, we use a MemoryNanny-object as sentinel for the memory, which gets freed as soon as the parent-numpy-array gets destroyed. The code is a little bit awkward, because PyArray_SetBaseObject steals the reference, which is not handled by Cython automatically:
%%cython
...
from cpython.object cimport PyObject
from cpython.ref cimport Py_INCREF
cimport numpy as np
#needed to initialize PyArray_API in order to be able to use it
np.import_array()
cdef extern from "numpy/arrayobject.h":
# a little bit awkward: the reference to obj will be stolen
# using PyObject* to signal that Cython cannot handle it automatically
int PyArray_SetBaseObject(np.ndarray arr, PyObject *obj) except -1 # -1 means there was an error
cdef array_from_ptr(void * ptr, np.npy_intp N, int np_type):
cdef np.ndarray arr = np.PyArray_SimpleNewFromData(1, &N, np_type, ptr)
nanny = MemoryNanny.create(ptr)
Py_INCREF(nanny) # a reference will get stolen, so prepare nanny
PyArray_SetBaseObject(arr, <PyObject*>nanny)
return arr
...
And here is an example, how this functionality can be called:
%%cython
...
from libc.stdlib cimport malloc
def create():
cdef double *ptr=<double*>malloc(sizeof(double)*8);
ptr[0]=42.0
return array_from_ptr(ptr, 8, np.NPY_FLOAT64)
which can be used as follows:
>>> m = create()
nanny for ptr= 94339864945184
>>> m.flags
...
OWNDATA : False
...
>>> m[0]
42.0
>>> del m
freeing ptr= 94339864945184
with results/output as expected.
Note: the resulting arrays doesn't really own the data (i.e. flags return OWNDATA : False), because the memory is owned be the memory-nanny, but the result is the same: the memory gets freed as soon as array is deleted (because nobody holds a reference to the nanny anymore).
MemoryNanny doesn't have to guard a raw C-pointer. It can be anything else, for example also a std::vector:
%%cython -+
from libcpp.vector cimport vector
cdef class VectorNanny:
#automatically default initialized/destructed by Cython:
cdef vector[double] vec
#staticmethod
cdef create(vector[double]& vec):
cdef VectorNanny result = VectorNanny()
result.vec.swap(vec) # swap and not copy
return result
# for testing:
def create_vector(int N):
cdef vector[double] vec;
vec.resize(N, 2.0)
return VectorNanny.create(vec)
The following test shows, that the nanny works:
nanny=create_vector(10**8) # top shows additional 800MB memory are used
del nanny # top shows, this additional memory is no longer used.
The latest Cython version allows you to do with with minimal syntax, albeit slightly more overhead than the lower-level solutions suggested.
numpy_array = np.asarray(<np.int32_t[:10, :10]> my_pointer)
https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html#coercion-to-numpy
This alone does not pass ownership.
Notably, a Cython array is generated with this call, via array_cwrapper.
This generates a cython.array, without allocating memory. The cython.array uses the stdlib.h malloc and free by default, so it would be expected that you use the default malloc, as well, instead of any special CPython/Numpy allocators.
free is only called if ownership is set for this cython.array, which it is by default only if it allocates data. For our case, we can manually set it via:
my_cyarr.free_data = True
So to return a 1D array, it would be as simple as:
from cython.view cimport array as cvarray
# ...
cdef cvarray cvarr = <np.int32_t[:N]> data
cvarr.free_data = True
return np.asarray(cvarr)

Categories

Resources