I want to compile a python function with cython, for reading a binary file skipping some records (without reading the whole file and then slicing, as I would run out of memory). I can come up with something like this:
def FromFileSkip(fid, count=1, skip=0):
if skip>=0:
data = numpy.zeros(count)
k = 0
while k<count:
try:
data[k] = numpy.fromfile(fid, count=1, dtype=dtype)
fid.seek(skip, 1)
k +=1
except ValueError:
data = data[:k]
break
return data
and then I can use the function like this:
f = open(filename)
data = FromFileSkip(f,...
However, for compiling the function "FromFileSkip" with cython, I would like to define all the types involved in the function, so "fid" as well, the file handler. How can I define its type in cython, as it is not a "standard" type, e.g. an integer.
Thanks.
Defining the type of fid won't help because calling python functions is still costly. Try compiling your example with "-a" flag to see what I mean. However, you can use low-level C functions for file handling to avoid python overhead in your loop. For the sake of example, I assumed that the data starts right from the beginning of the file and that its type is double
from libc.stdio cimport *
cdef extern from "stdio.h":
FILE *fdopen(int, const char *)
import numpy as np
cimport numpy as np
DTYPE = np.double # or whatever your type is
ctypedef np.double_t DTYPE_t # or whatever your type is
def FromFileSkip(fid, int count=1, int skip=0):
cdef int k
cdef FILE* cfile
cdef np.ndarray[DTYPE_t, ndim=1] data
cdef DTYPE_t* data_ptr
cfile = fdopen(fid.fileno(), 'rb') # attach the stream
data = np.zeros(count).astype(DTYPE)
data_ptr = <DTYPE_t*>data.data
# maybe skip some header bytes here
# ...
for k in range(count):
if fread(<void*>(data_ptr + k), sizeof(DTYPE_t), 1, cfile) < 0:
break
if fseek(cfile, skip, SEEK_CUR):
break
return data
Note that the output of cython -a example.pyx shows no python overhead inside the loop.
Related
I'm trying to make a cython-built slice-sampling library. A generic slice sampling library, where you supply a log-density, a starter value, and get a result. Working on the univariate model now. Based on the response here, I've come up with the following.
So i have a function defined in cSlice.pyx:
cdef double univariate_slice_sample(f_type_1 logd, double starter,
double increment_size = 0.5):
some stuff
return value
I have defined in cSlice.pxd:
cdef ctypedef double (*f_type_1)(double)
cdef double univariate_slice_sample(f_type_1 logd, double starter,
double increment_size = *)
where logd is a generic univariate log-density.
In my distribution file, let's say cDistribution.pyx, I have the following:
from cSlice cimport univariate_slice_sample, f_type_1
cdef double log_distribution(alpha_k, y_k, prior):
some stuff
return value
cdef double _sample_alpha_k_slice(
double starter,
double[:] y_k,
Prior prior,
double increment_size
):
cdef f_type_1 f = lambda alpha_k: log_distribution(alpha_k), y_k, prior)
return univariate_slice_sample(f, starter, increment_size)
cpdef double sample_alpha_k_slice(
double starter,
double[:] y_1,
Prior prior,
double increment_size = 0.5
):
return _sample_alpha_1_slice(starter, y_1, prior, increment_size)
the wrapper because apparently lambda's aren't allowed in cpdef's.
When I try compiling the distribution file, I get the following:
cDistribution.pyx:289:22: Cannot convert Python object to 'f_type_1'
pointing at the cdef f_type_1 f = ... line.
I'm unsure of what else to do. I want this code to maintain C speed, and importantly not hit the GIL. Any ideas?
You can jit a C-callback/wrapper for any Python function (cast to a pointer from a Python-object cannot done implicitly), how for example explained in this SO-post.
However, at its core the function will stay slow pure Python function. Numba gives you possibility to create real C-callbacks via a #cfunc. Here is a simplified example:
from numba import cfunc
#cfunc("float64(float64)")
def id_(x):
return x
and this is how it could be used:
%%cython
ctypedef double(*f_type)(double)
cdef void c_print_double(double x, f_type f):
print(2.0*f(x))
import numba
expected_signature = numba.float64(numba.float64)
def print_double(double x,f):
# check the signature of f:
if not f._sig == expected_signature:
raise TypeError("cfunc has not the right type")
# it is not possible to cast a Python object to a pointer directly,
# so we cast the address first to unsigned long long
c_print_double(x, <f_type><unsigned long long int>(f.address))
And now:
print_double(1.0, id_)
# 2.0
We need to check the signature of the cfunc-object during the run time, otherwise the casting <f_type><unsigned long long int>(f.address) would "work" also for the functions with wrong signature - only to (possible) crash during the call or giving funny hard to debug errors. I'm just not sure that my method is the best though - even if it works:
...
#cfunc("float32(float32)")
def id3_(x):
return x
print_double(1.0, id3_)
# TypeError: cfunc has not the right type
I have a C function that reads a binary file and returns a dynamically sized array of unsigned integers (the size is based off metadata from the binary file):
//example.c
#include <stdio.h>
#include <stdlib.h>
__declspec(dllexport)unsigned int *read_data(char *filename, size_t* array_size){
FILE *f = fopen(filename, "rb");
fread(array_size, sizeof(size_t), 1, f);
unsigned int *array = (unsigned int *)malloc(*array_size * sizeof(unsigned int));
fread(array, sizeof(unsigned int), *array_size, f);
fclose(f);
return array;
}
This answer appears to be saying that the correct way to pass the created array from C to Python is something like this:
# example_wrap.py
from ctypes import *
import os
os.add_dll_directory(os.getcwd())
indexer_dll = CDLL("example.dll")
def read_data(filename):
filename = bytes(filename, 'utf-8')
size = c_size_t()
ptr = indexer_dll.read_data(filename, byref(size))
return ptr[:size]
However, when I run the python wrapper, the code silently fails at ptr[:size] as if I'm trying to access an array out of bounds, and I probably am, but what is the correct way to pass this dynamically size array?
A few considerations:
First, you need to properly set the prototype of the C function so that ctypes can properly convert between the C and Python types.
Second, since size is actually a ctypes.c_size_t object, you actually need to use size.value to access the numeric value of the array size.
Third, since ptr[:size.value] actually copies the array contents to a Python list, you'll want to make sure you also free() the allocated C array since you're not going to use it anymore.
(Perhaps copying the array to a Python list is not ideal here, but I'll assume it's ok here, since otherwise you have more complexity in handling the C array in Python.)
This should work:
from ctypes import *
import os
os.add_dll_directory(os.getcwd())
indexer_dll = CDLL("example.dll")
indexer_dll.read_data.argtypes = [c_char_p, POINTER(c_size_t)
indexer_dll.read_data.restype = POINTER(c_int)
libc = cdll.msvcrt
def read_data(filename):
filename = bytes(filename, 'utf-8')
size = c_size_t()
ptr = indexer_dll.read_data(filename, byref(size))
result = ptr[:size.value]
libc.free(ptr)
return result
I need to get an overview of the performance one can get from using Cython in high performance numerical code. One of the thing I am interested in is to find out if an optimizing C compiler can vectorize code generated by Cython. So I decided to write the following small example:
import numpy as np
cimport numpy as np
cimport cython
#cython.boundscheck(False)
#cython.wraparound(False)
cpdef int f(np.ndarray[int, ndim = 1] f):
cdef int array_length = f.shape[0]
cdef int sum = 0
cdef int k
for k in range(array_length):
sum += f[k]
return sum
I know that there are Numpy functions that does the job, but I would like to have an easy code in order to understand what is possible with Cython. It turns out that the code generated with:
from distutils.core import setup
from Cython.Build import cythonize
setup(ext_modules = cythonize("sum.pyx"))
and called with:
python setup.py build_ext --inplace
generates a C code which look likes this for the loop:
for (__pyx_t_2 = 0; __pyx_t_2 < __pyx_t_1; __pyx_t_2 += 1) {
__pyx_v_sum = __pyx_v_sum + (*(int *)((char *)
__pyx_pybuffernd_f.rcbuffer->pybuffer.buf +
__pyx_t_2 * __pyx_pybuffernd_f.diminfo[0].strides)));
}
The main problem with this code is that the compiler does not know at compile time that __pyx_pybuffernd_f.diminfo[0].strides is such that the elements of the array are close together in memory. Without that information, the compiler cannot vectorize efficiently.
Is there a way to do such a thing from Cython?
You have two problems in your code (use option -a to make it visible):
The indexing of numpy array isn't efficient
You have forgotten int in cdef sum=0
Taking this into account we get:
cpdef int f(np.ndarray[np.int_t] f): ##HERE
assert f.dtype == np.int
cdef int array_length = f.shape[0]
cdef int sum = 0 ##HERE
cdef int k
for k in range(array_length):
sum += f[k]
return sum
For the loop the following code:
int __pyx_t_5;
int __pyx_t_6;
Py_ssize_t __pyx_t_7;
....
__pyx_t_5 = __pyx_v_array_length;
for (__pyx_t_6 = 0; __pyx_t_6 < __pyx_t_5; __pyx_t_6+=1) {
__pyx_v_k = __pyx_t_6;
__pyx_t_7 = __pyx_v_k;
__pyx_v_sum = (__pyx_v_sum + (*__Pyx_BufPtrStrided1d(__pyx_t_5numpy_int_t *, __pyx_pybuffernd_f.rcbuffer->pybuffer.buf, __pyx_t_7, __pyx_pybuffernd_f.diminfo[0].strides)));
}
Which is not that bad, but not as easy for the optimizer as the normal code written by human. As you have already pointed out, __pyx_pybuffernd_f.diminfo[0].strides isn't known at compile time and this prevents vectorization.
However, you would get better results, when using typed memory views, i.e:
cpdef int mf(int[::1] f):
cdef int array_length = len(f)
...
which leads to a less opaque C-code - the one, at least my compiler, can better optimize:
__pyx_t_2 = __pyx_v_array_length;
for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
__pyx_v_k = __pyx_t_3;
__pyx_t_4 = __pyx_v_k;
__pyx_v_sum = (__pyx_v_sum + (*((int *) ( /* dim=0 */ ((char *) (((int *) __pyx_v_f.data) + __pyx_t_4)) ))));
}
The most crucial thing here, is that we make it clear to the cython, that the memory is continuous, i.e. int[::1] compared to int[:] as it is seen for numpy-arrays, for which a possible stride!=1 must be taken into account.
In this case, the cython-generated C-code results in the same assembler as the code I would have written. As crisb has pointed out, adding -march=native would lead to vectorization, but in this case the assembler of both functions would be slightly different again.
However, in my experience, compilers have quite often some problems to optimize loops created by cython and/or it is easier to miss a detail which prevents the generation of really good C-code. So my strategy for working-horse-loops is to write them in plain C and use cython for wrapping/accessing them - often it is somewhat faster, because one can also use dedicated compiler flags for this code-snipped without affecting the whole Cython-module.
I am fairly new to Cython, so this is probably fairly trivial, but I haven't been able to find the answer anywhere.
I've defined a struct type and I want to write a function that will initialize all the fields properly and return a pointer to the new struct.
from cpython.mem import PyMem_Malloc
ctypedef struct cell_t:
DTYPE_t[2] min_bounds
DTYPE_t[2] max_bounds
DTYPE_t size
bint is_leaf
cell_t * children[4]
DTYPE_t[2] center_of_mass
UINT32_t count
cdef cell_t * make_cell(DTYPE_t[2] min_bounds, DTYPE_t[2] max_bounds):
cdef cell_t * cell = <cell_t *>PyMem_Malloc(sizeof(cell_t)) # <- Fails here
if not cell:
MemoryError()
cell.min_bounds[:] = min_bounds
cell.max_bounds[:] = max_bounds
cell.size = min_bounds[0] - max_bounds[0]
cell.is_leaf = True
cell.center_of_mass[:] = [0, 0]
cell.count = 0
return cell
However, when I try to compile this, I get the following two errors during compilation:
cdef cell_t * make_cell(DTYPE_t[2] min_bounds, DTYPE_t[2] max_bounds):
cdef cell_t * cell = <cell_t *>PyMem_Malloc(sizeof(cell_t))
^
Casting temporary Python object to non-numeric non-Python type
------------------------------------------------------------
cdef cell_t * make_cell(DTYPE_t[2] min_bounds, DTYPE_t[2] max_bounds):
cdef cell_t * cell = <cell_t *>PyMem_Malloc(sizeof(cell_t))
^
Storing unsafe C derivative of temporary Python reference
------------------------------------------------------------
Now, I've looked all over, and from what I can gather, cell is actually stored in a temporary variable that gets deallocated at the end of the function.
Any help would be greatly appreciated.
cell.min_bounds = min_bounds
This doesn't do what you think it does (although I'm not 100% sure what it does do). You need to copy arrays element by element:
cell.min_bounds[0] = min_bounds[0]
cell.min_bounds[1] = min_bounds[1]
Same for max_bounds.
The line that I suspect is giving you that error message is:
cell.center_of_mass = [0, 0]
This is trying to assign a Python list to a C array (remembering that arrays and pointers are somewhat interchangeable in C), which doesn't make much sense. Again, you'd do
cell.center_of_mass[0] = 0
cell.center_of_mass[1] = 0
All this is largely consistent with the C behaviour that there aren't operators to copy whole arrays into each other, you need to copy element by element.
Edit:
However that's not your immediate problem. You haven't declared PyMem_Malloc so it's assumed to be a Python function. You should do
from cpython.mem cimport PyMem_Malloc
Make sure it's cimported, not imported
Edit2:
The following compiles fine for me:
from cpython.mem cimport PyMem_Malloc
ctypedef double DTYPE_t
ctypedef struct cell_t:
DTYPE_t[2] min_bounds
DTYPE_t[2] max_bounds
cdef cell_t * make_cell(DTYPE_t[2] min_bounds, DTYPE_t[2] max_bounds) except NULL:
cdef cell_t * cell = <cell_t *>PyMem_Malloc(sizeof(cell_t))
if not cell:
raise MemoryError()
return cell
I've cut down cell_t a bit (just to avoid having to make declarations of UINT32_t). I've also given the cdef function an except NULL to allow it to signal an error if needed and added a raise before MemoryError(). I don't think either of these changes are directly related to your error.
I am trying to build a wrapper to a camera driver in written C using Cython. I am new to Cython (started 2 weeks ago). After a lot of struggle, I could successfully develop wrappers for structures, 1D arrays but now I am stuck with 2D arrays.
One of the camera's C APIs takes a 2D array pointer as input and assigns the captured image to it. This function needs to be called from Python and the output image needs to be processed/displayed in Python. After going through the Cython docs and various posts on stack-overflow, I ended up with more confusion. I could not figure out how to pass 2D arrays between Python and the C. The driver api looks (somewhat) like this:
driver.h
void assign_values2D(double **matrix, unsigned int row_size, unsigned int column_size);
c_driver.pyd
cdef extern from "driver.h":
void assign_values2D(double **matrix, unsigned int row_size, unsigned int column_size)
test.pyx
from c_driver import assign_values2D
import numpy as np
cimport numpy as np
cimport cython
from libc.stdlib cimport malloc, free
import ctypes
#cython.boundscheck(False)
#cython.wraparound(False)
def assignValues2D(self, np.ndarray[np.double_t,ndim=2,mode='c']mat):
row_size,column_size = np.shape(mat)
cdef np.ndarray[double, ndim=2, mode="c"] temp_mat = np.ascontiguousarray(mat, dtype = ctypes.c_double)
cdef double ** mat_pointer = <double **>malloc(column_size * sizeof(double*))
if not mat_pointer:
raise MemoryError
try:
for i in range(row_size):
mat_pointer[i] = &temp_mat[i, 0]
assign_values2D(<double **> &mat_pointer[0], row_size, column_size)
return np.array(mat)
finally:
free(mat_pointer)
test_camera.py
b = np.zeros((5,5), dtype=np.float) # sample code
print "B Before = "
print b
assignValues2D(b)
print "B After = "
print b
When compiled, it gives the error:
Error compiling Cython file:
------------------------------------------------------------
...
if not mat_pointer:
raise MemoryError
try:
for i in range(row_size):
mat_pointer[i] = &temp_mat[i, 0]
^
------------------------------------------------------------
test.pyx:120:21: Cannot take address of Python variable
In fact, the above code was taken from a stack-overflow post. I have tried several other ways but none of them are working. Please let me know how I can get the 2D image into Python. Thanks in advance.
You need to type i:
cdef int i
(Alternatively you can type row_size and it also works)
Once it knows that i is an int then it can work out the type that indexing the tmp_map gives and so the & operator works.
Normally it's pretty good about figuring out the type of loop variables like i, but I think the issue is that it can't deduce the type of row_size so it decided it can't deduce the type of i since it is deduced from range(row_size). Because of that it can't deduce the type of temp_mat[i,0].
I suspect you also you also want to change the return statement to return np.array(temp_mat) - the code you have will likely work most of the time but occasionally np.ascontinuousarray will have to make a copy and mat won't be changed.