I'm trying to make a cython-built slice-sampling library. A generic slice sampling library, where you supply a log-density, a starter value, and get a result. Working on the univariate model now. Based on the response here, I've come up with the following.
So i have a function defined in cSlice.pyx:
cdef double univariate_slice_sample(f_type_1 logd, double starter,
double increment_size = 0.5):
some stuff
return value
I have defined in cSlice.pxd:
cdef ctypedef double (*f_type_1)(double)
cdef double univariate_slice_sample(f_type_1 logd, double starter,
double increment_size = *)
where logd is a generic univariate log-density.
In my distribution file, let's say cDistribution.pyx, I have the following:
from cSlice cimport univariate_slice_sample, f_type_1
cdef double log_distribution(alpha_k, y_k, prior):
some stuff
return value
cdef double _sample_alpha_k_slice(
double starter,
double[:] y_k,
Prior prior,
double increment_size
):
cdef f_type_1 f = lambda alpha_k: log_distribution(alpha_k), y_k, prior)
return univariate_slice_sample(f, starter, increment_size)
cpdef double sample_alpha_k_slice(
double starter,
double[:] y_1,
Prior prior,
double increment_size = 0.5
):
return _sample_alpha_1_slice(starter, y_1, prior, increment_size)
the wrapper because apparently lambda's aren't allowed in cpdef's.
When I try compiling the distribution file, I get the following:
cDistribution.pyx:289:22: Cannot convert Python object to 'f_type_1'
pointing at the cdef f_type_1 f = ... line.
I'm unsure of what else to do. I want this code to maintain C speed, and importantly not hit the GIL. Any ideas?
You can jit a C-callback/wrapper for any Python function (cast to a pointer from a Python-object cannot done implicitly), how for example explained in this SO-post.
However, at its core the function will stay slow pure Python function. Numba gives you possibility to create real C-callbacks via a #cfunc. Here is a simplified example:
from numba import cfunc
#cfunc("float64(float64)")
def id_(x):
return x
and this is how it could be used:
%%cython
ctypedef double(*f_type)(double)
cdef void c_print_double(double x, f_type f):
print(2.0*f(x))
import numba
expected_signature = numba.float64(numba.float64)
def print_double(double x,f):
# check the signature of f:
if not f._sig == expected_signature:
raise TypeError("cfunc has not the right type")
# it is not possible to cast a Python object to a pointer directly,
# so we cast the address first to unsigned long long
c_print_double(x, <f_type><unsigned long long int>(f.address))
And now:
print_double(1.0, id_)
# 2.0
We need to check the signature of the cfunc-object during the run time, otherwise the casting <f_type><unsigned long long int>(f.address) would "work" also for the functions with wrong signature - only to (possible) crash during the call or giving funny hard to debug errors. I'm just not sure that my method is the best though - even if it works:
...
#cfunc("float32(float32)")
def id3_(x):
return x
print_double(1.0, id3_)
# TypeError: cfunc has not the right type
Related
I am writing a Cython wrapper for the NAG C library.
In one of the header files from the NAG C library is the macros:
#define NAG_FREE(x) x04bdc((Pointer *)&(x))
Pointer is void*
x04bdc is:
extern void NAG_CALL x04bdc(Pointer *ptr);
NAG_FREE is the NAG library equivalent of free(), to free up memory.
Here is the extract from my lib_nag_integrate.pxd file:
cdef extern from "<nagx04.h>":
void x04bdc(Pointer *ptr)
x04bdc is a "fancy" free (malloc) routine. I cant access this code.
I then create a cdef function in my .pyx file:
cdef void NAG_FREE(double *x):
x04bdc(<Pointer *>&x)
Here i have type casted x to a double pointer, as that is what I am trying to free from memory, however the NAG library examples seem to use it for any type of pointer.
When running the python script which calls a cpdef function which eventually uses NAG_FREE, I get the following error:
Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
If i comment out the NAG_FREE calls then it works fine, however NAG say it is necessary to use NAG_FREE.
The cdef function using NAG_FREE is:
cdef (double,double,Integer,Integer) dim1_fin_gen(lib_nag_integrate.NAG_D01SJC_FUN objfun,double a, double b,double epsabs, double epsrel,
Integer max_num_subint,Nag_User *comm,integration_out *output):
"""
:param f: user function
:type f: function
:param a: lower limit of integration
:type a: real float
:param b: upper limit of integration
:type b: real float
:param epsabs: user requested absolute error
:type epsabs: integer
:param epsrel: user requested relative error
:type epsrel: integer
:param max_num_subint: maximum number of subintervals
:type max_num_subint: real integer
:return: integration value of user function f
:rtype: real float
"""
cdef lib_nag_integrate.Nag_QuadProgress _qp
cdef lib_nag_integrate.NagError _fail
cdef double result
cdef double abserr
_fail.print = True
_fail.code = 0
_fail.errnum = 0
_fail.handler = NULL
lib_nag_integrate.d01sjc(objfun, a, b, epsabs, epsrel,
max_num_subint, &result, &abserr,
&_qp, comm, &_fail)
if _fail.code > 0 :
errorMessage = _fail.message
raise NagException(_fail.code,errorMessage)
print(_fail.message)
else:
output[0].result = result
output[0].abserr = abserr
output[0].fun_count = _qp.fun_count
output[0].num_subint = _qp.num_subint
NAG_FREE(_qp.sub_int_beg_pts)
NAG_FREE(_qp.sub_int_end_pts)
NAG_FREE(_qp.sub_int_result)
NAG_FREE(_qp.sub_int_error)
My libnag_integrate.pxd header file imports the following from the c library:
cdef extern from "<nag_types.h>":
ctypedef bint Nag_Boolean
ctypedef long Integer
ctypedef void* Pointer
ctypedef struct NagError:
int code
bint print "print"
char message[512]
Integer errnum
void (*handler)(char*,int,char*)
ctypedef struct Nag_User:
Pointer p
ctypedef struct Nag_QuadProgress:
Integer num_subint
Integer fun_count
double *sub_int_beg_pts
double *sub_int_end_pts
double *sub_int_result
double *sub_int_error
cdef extern from "<nagx04.h>":
(void*) NAG_ALLOC "x04bjc" (size_t size)
void x04bdc(Pointer *ptr)
cdef extern from "<nagd01.h>":
void d01sjc(NAG_D01SJC_FUN f, double a, double b,
double epsabs, double epsrel, Integer max_num_subint, double *result,
double *abserr, Nag_QuadProgress *qp, Nag_User *comm,
NagError *fail)
d01sjc is an integration routine which I cannot access. It allocates the memory of qp.sub_int_beg_pts etc inside.
I think I have a corrupt pointer causing the error. If so, where is it and how to I fix it?
many thanks
Upon further inspection of the structure '_qp'. The same error occurs when dereferencing e.g:
x = _qp.sub_int_end_pts[0]
so its the dereferencing of _qp which is causing the error.
The struct type Nag_QuadProgress is imported from its NAG header file into my .pxd as follows:
cdef extern from "<nag_types.h>":
ctypedef struct Nag_QuadProgress:
Integer num_subint
Integer fun_count
double *sub_int_beg_pts
double *sub_int_end_pts
double *sub_int_result
double *sub_int_error
Any ideas why dereferencing the pointers in this structure causes the error?
From Cython's point of view you're using NAG_FREE as a function, so that's what you should declare it as. It doesn't really matter that it's a really macro, and it certainly doesn't help to attempt to reimplement it.
cdef extern from "whatever_the_nag_header_is":
void NAG_FREE(Pointer x)
# or
void NAG_FREE(void *x)
# or
void NAG_FREE(...) # accepts anything - Cython doesn't attempt to match types
You may have to play around a bit with the type of the arguments to get it to work - I've suggested three options.
Really all you're aiming to do is to give Cython enough information that it can generate the right C code, and the right C code is NAG_FREE(your_variable), as if it's a function call.
With your code:
(<integration_out*>output)[0] suggests you're doing something very wrong. output is an integration_out pointer so why are you casting it? It either does nothing or introduces a potential error.
Despite claiming to return a C tuple type you actually don't bother to return anything.
I need to get an overview of the performance one can get from using Cython in high performance numerical code. One of the thing I am interested in is to find out if an optimizing C compiler can vectorize code generated by Cython. So I decided to write the following small example:
import numpy as np
cimport numpy as np
cimport cython
#cython.boundscheck(False)
#cython.wraparound(False)
cpdef int f(np.ndarray[int, ndim = 1] f):
cdef int array_length = f.shape[0]
cdef int sum = 0
cdef int k
for k in range(array_length):
sum += f[k]
return sum
I know that there are Numpy functions that does the job, but I would like to have an easy code in order to understand what is possible with Cython. It turns out that the code generated with:
from distutils.core import setup
from Cython.Build import cythonize
setup(ext_modules = cythonize("sum.pyx"))
and called with:
python setup.py build_ext --inplace
generates a C code which look likes this for the loop:
for (__pyx_t_2 = 0; __pyx_t_2 < __pyx_t_1; __pyx_t_2 += 1) {
__pyx_v_sum = __pyx_v_sum + (*(int *)((char *)
__pyx_pybuffernd_f.rcbuffer->pybuffer.buf +
__pyx_t_2 * __pyx_pybuffernd_f.diminfo[0].strides)));
}
The main problem with this code is that the compiler does not know at compile time that __pyx_pybuffernd_f.diminfo[0].strides is such that the elements of the array are close together in memory. Without that information, the compiler cannot vectorize efficiently.
Is there a way to do such a thing from Cython?
You have two problems in your code (use option -a to make it visible):
The indexing of numpy array isn't efficient
You have forgotten int in cdef sum=0
Taking this into account we get:
cpdef int f(np.ndarray[np.int_t] f): ##HERE
assert f.dtype == np.int
cdef int array_length = f.shape[0]
cdef int sum = 0 ##HERE
cdef int k
for k in range(array_length):
sum += f[k]
return sum
For the loop the following code:
int __pyx_t_5;
int __pyx_t_6;
Py_ssize_t __pyx_t_7;
....
__pyx_t_5 = __pyx_v_array_length;
for (__pyx_t_6 = 0; __pyx_t_6 < __pyx_t_5; __pyx_t_6+=1) {
__pyx_v_k = __pyx_t_6;
__pyx_t_7 = __pyx_v_k;
__pyx_v_sum = (__pyx_v_sum + (*__Pyx_BufPtrStrided1d(__pyx_t_5numpy_int_t *, __pyx_pybuffernd_f.rcbuffer->pybuffer.buf, __pyx_t_7, __pyx_pybuffernd_f.diminfo[0].strides)));
}
Which is not that bad, but not as easy for the optimizer as the normal code written by human. As you have already pointed out, __pyx_pybuffernd_f.diminfo[0].strides isn't known at compile time and this prevents vectorization.
However, you would get better results, when using typed memory views, i.e:
cpdef int mf(int[::1] f):
cdef int array_length = len(f)
...
which leads to a less opaque C-code - the one, at least my compiler, can better optimize:
__pyx_t_2 = __pyx_v_array_length;
for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
__pyx_v_k = __pyx_t_3;
__pyx_t_4 = __pyx_v_k;
__pyx_v_sum = (__pyx_v_sum + (*((int *) ( /* dim=0 */ ((char *) (((int *) __pyx_v_f.data) + __pyx_t_4)) ))));
}
The most crucial thing here, is that we make it clear to the cython, that the memory is continuous, i.e. int[::1] compared to int[:] as it is seen for numpy-arrays, for which a possible stride!=1 must be taken into account.
In this case, the cython-generated C-code results in the same assembler as the code I would have written. As crisb has pointed out, adding -march=native would lead to vectorization, but in this case the assembler of both functions would be slightly different again.
However, in my experience, compilers have quite often some problems to optimize loops created by cython and/or it is easier to miss a detail which prevents the generation of really good C-code. So my strategy for working-horse-loops is to write them in plain C and use cython for wrapping/accessing them - often it is somewhat faster, because one can also use dedicated compiler flags for this code-snipped without affecting the whole Cython-module.
I am fairly new to Cython, so this is probably fairly trivial, but I haven't been able to find the answer anywhere.
I've defined a struct type and I want to write a function that will initialize all the fields properly and return a pointer to the new struct.
from cpython.mem import PyMem_Malloc
ctypedef struct cell_t:
DTYPE_t[2] min_bounds
DTYPE_t[2] max_bounds
DTYPE_t size
bint is_leaf
cell_t * children[4]
DTYPE_t[2] center_of_mass
UINT32_t count
cdef cell_t * make_cell(DTYPE_t[2] min_bounds, DTYPE_t[2] max_bounds):
cdef cell_t * cell = <cell_t *>PyMem_Malloc(sizeof(cell_t)) # <- Fails here
if not cell:
MemoryError()
cell.min_bounds[:] = min_bounds
cell.max_bounds[:] = max_bounds
cell.size = min_bounds[0] - max_bounds[0]
cell.is_leaf = True
cell.center_of_mass[:] = [0, 0]
cell.count = 0
return cell
However, when I try to compile this, I get the following two errors during compilation:
cdef cell_t * make_cell(DTYPE_t[2] min_bounds, DTYPE_t[2] max_bounds):
cdef cell_t * cell = <cell_t *>PyMem_Malloc(sizeof(cell_t))
^
Casting temporary Python object to non-numeric non-Python type
------------------------------------------------------------
cdef cell_t * make_cell(DTYPE_t[2] min_bounds, DTYPE_t[2] max_bounds):
cdef cell_t * cell = <cell_t *>PyMem_Malloc(sizeof(cell_t))
^
Storing unsafe C derivative of temporary Python reference
------------------------------------------------------------
Now, I've looked all over, and from what I can gather, cell is actually stored in a temporary variable that gets deallocated at the end of the function.
Any help would be greatly appreciated.
cell.min_bounds = min_bounds
This doesn't do what you think it does (although I'm not 100% sure what it does do). You need to copy arrays element by element:
cell.min_bounds[0] = min_bounds[0]
cell.min_bounds[1] = min_bounds[1]
Same for max_bounds.
The line that I suspect is giving you that error message is:
cell.center_of_mass = [0, 0]
This is trying to assign a Python list to a C array (remembering that arrays and pointers are somewhat interchangeable in C), which doesn't make much sense. Again, you'd do
cell.center_of_mass[0] = 0
cell.center_of_mass[1] = 0
All this is largely consistent with the C behaviour that there aren't operators to copy whole arrays into each other, you need to copy element by element.
Edit:
However that's not your immediate problem. You haven't declared PyMem_Malloc so it's assumed to be a Python function. You should do
from cpython.mem cimport PyMem_Malloc
Make sure it's cimported, not imported
Edit2:
The following compiles fine for me:
from cpython.mem cimport PyMem_Malloc
ctypedef double DTYPE_t
ctypedef struct cell_t:
DTYPE_t[2] min_bounds
DTYPE_t[2] max_bounds
cdef cell_t * make_cell(DTYPE_t[2] min_bounds, DTYPE_t[2] max_bounds) except NULL:
cdef cell_t * cell = <cell_t *>PyMem_Malloc(sizeof(cell_t))
if not cell:
raise MemoryError()
return cell
I've cut down cell_t a bit (just to avoid having to make declarations of UINT32_t). I've also given the cdef function an except NULL to allow it to signal an error if needed and added a raise before MemoryError(). I don't think either of these changes are directly related to your error.
I am trying to build a wrapper to a camera driver in written C using Cython. I am new to Cython (started 2 weeks ago). After a lot of struggle, I could successfully develop wrappers for structures, 1D arrays but now I am stuck with 2D arrays.
One of the camera's C APIs takes a 2D array pointer as input and assigns the captured image to it. This function needs to be called from Python and the output image needs to be processed/displayed in Python. After going through the Cython docs and various posts on stack-overflow, I ended up with more confusion. I could not figure out how to pass 2D arrays between Python and the C. The driver api looks (somewhat) like this:
driver.h
void assign_values2D(double **matrix, unsigned int row_size, unsigned int column_size);
c_driver.pyd
cdef extern from "driver.h":
void assign_values2D(double **matrix, unsigned int row_size, unsigned int column_size)
test.pyx
from c_driver import assign_values2D
import numpy as np
cimport numpy as np
cimport cython
from libc.stdlib cimport malloc, free
import ctypes
#cython.boundscheck(False)
#cython.wraparound(False)
def assignValues2D(self, np.ndarray[np.double_t,ndim=2,mode='c']mat):
row_size,column_size = np.shape(mat)
cdef np.ndarray[double, ndim=2, mode="c"] temp_mat = np.ascontiguousarray(mat, dtype = ctypes.c_double)
cdef double ** mat_pointer = <double **>malloc(column_size * sizeof(double*))
if not mat_pointer:
raise MemoryError
try:
for i in range(row_size):
mat_pointer[i] = &temp_mat[i, 0]
assign_values2D(<double **> &mat_pointer[0], row_size, column_size)
return np.array(mat)
finally:
free(mat_pointer)
test_camera.py
b = np.zeros((5,5), dtype=np.float) # sample code
print "B Before = "
print b
assignValues2D(b)
print "B After = "
print b
When compiled, it gives the error:
Error compiling Cython file:
------------------------------------------------------------
...
if not mat_pointer:
raise MemoryError
try:
for i in range(row_size):
mat_pointer[i] = &temp_mat[i, 0]
^
------------------------------------------------------------
test.pyx:120:21: Cannot take address of Python variable
In fact, the above code was taken from a stack-overflow post. I have tried several other ways but none of them are working. Please let me know how I can get the 2D image into Python. Thanks in advance.
You need to type i:
cdef int i
(Alternatively you can type row_size and it also works)
Once it knows that i is an int then it can work out the type that indexing the tmp_map gives and so the & operator works.
Normally it's pretty good about figuring out the type of loop variables like i, but I think the issue is that it can't deduce the type of row_size so it decided it can't deduce the type of i since it is deduced from range(row_size). Because of that it can't deduce the type of temp_mat[i,0].
I suspect you also you also want to change the return statement to return np.array(temp_mat) - the code you have will likely work most of the time but occasionally np.ascontinuousarray will have to make a copy and mat won't be changed.
I want to compile a python function with cython, for reading a binary file skipping some records (without reading the whole file and then slicing, as I would run out of memory). I can come up with something like this:
def FromFileSkip(fid, count=1, skip=0):
if skip>=0:
data = numpy.zeros(count)
k = 0
while k<count:
try:
data[k] = numpy.fromfile(fid, count=1, dtype=dtype)
fid.seek(skip, 1)
k +=1
except ValueError:
data = data[:k]
break
return data
and then I can use the function like this:
f = open(filename)
data = FromFileSkip(f,...
However, for compiling the function "FromFileSkip" with cython, I would like to define all the types involved in the function, so "fid" as well, the file handler. How can I define its type in cython, as it is not a "standard" type, e.g. an integer.
Thanks.
Defining the type of fid won't help because calling python functions is still costly. Try compiling your example with "-a" flag to see what I mean. However, you can use low-level C functions for file handling to avoid python overhead in your loop. For the sake of example, I assumed that the data starts right from the beginning of the file and that its type is double
from libc.stdio cimport *
cdef extern from "stdio.h":
FILE *fdopen(int, const char *)
import numpy as np
cimport numpy as np
DTYPE = np.double # or whatever your type is
ctypedef np.double_t DTYPE_t # or whatever your type is
def FromFileSkip(fid, int count=1, int skip=0):
cdef int k
cdef FILE* cfile
cdef np.ndarray[DTYPE_t, ndim=1] data
cdef DTYPE_t* data_ptr
cfile = fdopen(fid.fileno(), 'rb') # attach the stream
data = np.zeros(count).astype(DTYPE)
data_ptr = <DTYPE_t*>data.data
# maybe skip some header bytes here
# ...
for k in range(count):
if fread(<void*>(data_ptr + k), sizeof(DTYPE_t), 1, cfile) < 0:
break
if fseek(cfile, skip, SEEK_CUR):
break
return data
Note that the output of cython -a example.pyx shows no python overhead inside the loop.