cython class containing c strings; buffer overrun? - python

Trying to learn a little Cython, I've been attempting to write a toy library that just holds a few cstrings (corresponding to the available choices for a factoral/categorical data type). The strings being pointed to within the class are being overwritten, and my C/Cython-foo is too minimal to figure out why.
The result is something like this:
>>> import coupla
>>> ff = coupla.CouplaStrings(["one", "two"])
>>> ff
write, two
>>> ff
, two
>>> ff
two, two
Help is greatly appreciated! I feel like I'm going crazy. Just using the to_cstring_array and to_str_list functions seems to work fine, but within the class it goes kaputt.
cdef extern from "Python.h":
char* PyUnicode_AsUTF8(object unicode)
from libc.stdlib cimport malloc, free
cdef char **to_cstring_array(list_str):
"""Stolen from Stackoverflow:
https://stackoverflow.com/questions/17511309/fast-string-array-cython/17511714#17511714
"""
cdef Py_ssize_t num_strs = len(list_str)
cdef char **ret = <char **>malloc(num_strs * sizeof(char *))
for i in range(num_strs):
ret[i] = PyUnicode_AsUTF8(list_str[i])
return ret
cdef to_str_list(char **cstr_array, Py_ssize_t size):
cdef int i
result = []
for i in range(size):
result.append(bytes(cstr_array[i]).decode("utf-8"))
return result
cdef class CouplaStrings:
cdef char **_strings
cdef Py_ssize_t _num_strings
def __init__(self, strings):
cdef Py_ssize_t num_strings = len(strings)
cdef char **tstrings = <char **> to_cstring_array(strings)
self._num_strings = num_strings
self._strings = tstrings
def __repr__(self):
"""Just for testing."""
return ", ".join(to_str_list(self._strings, self._num_strings))
def __dealloc__(self):
free(self._strings)
Edit:
See the answer below by user2357112. An edited version of CouplaStrings seems to avoid that particular problem, though I wouldn't swear on its overall correctness.
Edit 2: THIS IS WRONG IGNORE ONLY KEPT FOR HISTORICAL PURPOSES
cdef class CouplaStrings:
cdef char **_strings
cdef Py_ssize_t _num_strings
def __init__(self, strings):
cdef Py_ssize_t num_strings = len(strings)
cdef char **ret = <char **> PyMem_Malloc(num_strings * sizeof(char *))
for i in range(num_strings):
ret[i] = <char *> PyMem_Realloc(PyUnicode_AsUTF8(strings[i]),
sizeof(char *))
self._num_strings = num_strings
self._strings = ret
def __repr__(self):
"""Just for testing."""
return ", ".join(to_str_list(self._strings, self._num_strings))
def __dealloc__(self):
PyMem_Free(self._strings)

You've failed to account for ownership and memory management.
The UTF-8 encoding returned by PyUnicode_AsUTF8 is owned by the string object PyUnicode_AsUTF8 was called on, and it is reclaimed when that string dies. To prevent the string object from dying before your object does, your object needs to keep a (Python) reference to the string object. Alternatively, you can copy the UTF-8 encodings into memory you allocate yourself, and take responsibility for freeing that memory yourself.
Otherwise, you'll just have an array of dangling pointers.

Related

Cython Create C Function Alias

I have two variants of a function: void func1(double *) and void func2(double *) that are externed from C++ code.
I want to be able to write a function or mapping that wraps them:
cdef func_alias(int choice):
if choice == 0:
return func1
elif choice == 1:
return func2
But compiling Cannot convert 'void (double *) nogil' to Python object
Alternatively I have tried using dicts which produces the same error:
cdef dict func_dict = {0: func1, 1: func2}
But I get the same error.
I am not sure if I can do something along the lines of
from libcpp.map import map
cdef map[int, void] func_map = {0: func1, 1: func2}
which results in Cannot interpret dict as type 'map[int,void]'
Your func_alias function does not define a return type (which means it will default to a python object). Since function pointers are not a valid python object, cython gives you that error message on compilation. We can define a ctypedef representing a function pointer and use that as the return type instead. Here is an example that does just that:
ctypedef void (* double_func)(double *)
cdef void func_1(double *arg1):
print(1, arg1[0])
cdef void func_2(double *arg1):
print(2, arg1[0])
cdef double_func func_alias(int choice):
if choice == 1:
return func_1
elif choice == 2:
return func_2
cdef double test_input = 3.14
func_alias(1)(&test_input)
func_alias(2)(&test_input)
As a side note, if you only have a fixed number of potential function pointers to consider, I would consider using an enum instead for the if-statements. I can include an example of that if it would help. Let me know if anything is unclear.
Update:
Looking at the second part of the question, I see that you were also considering using a hashmap to map ints to function pointers. While you can't use a dict to do this since those can only store python objects, you can use a map (or unordered_map, which should perform slightly better). Unfortunately, you cannot use convenient python dict syntax to initialize all of the values of the dict, and instead must add items one by one. Here is that approach in action:
from libcpp.unordered_map cimport unordered_map
ctypedef void (* double_func)(double *)
cdef unordered_map[int, double_func] func_map
func_map[1] = func_1
func_map[2] = func_2
cdef void func_1(double *arg1):
print(1, arg1[0])
cdef void func_2(double *arg1):
print(2, arg1[0])
cdef double_func func_alias(int choice):
return func_map[choice]
cdef double test_input = 3.14
func_alias(1)(&test_input)
func_alias(2)(&test_input)

Passing a struct* from one Cython class to another

I am trying to pass a struct pointer from one Cython class to another. Here is some example code:
cdef struct MyStruct:
int a
int b
cdef class MyClass:
cdef MyStruct* s
def __init__(self):
self.s = <MyStruct*> malloc(sizeof(MyStruct))
self.s.a = 1
self.s.b = 2
cdef MyStruct* get_my_struct(self):
return self.s
cdef class PrinterClass:
cdef object m
def __init__(self):
self.m = MyClass()
cpdef print_struct(self):
cdef MyStruct* my_struct
my_struct = self.m.get_my_struct()
print(my_struct.a)
When I try to compile this class, I get these 2 errors around the my_struct = self.m.get_my_struct() line:
Cannot convert Python object to 'MyStruct *
and
Storing unsafe C derivative of temporary Python reference
Why is Cython attempting to do conversions here? Can't it just pass the pointer as is?
In PrinterClass, replace cdef object m with cdef MyClass m or explicitly cast self.m to MyClass: my_struct = (<MyClass>self.m).get_my_struct(). (In addition, a __dealloc__ should be added to MyClass).
I guess the difference lies in that object is a python object(in essence,dict), while cdef class is another kind of class(in essence, struct), see Extension types (aka. cdef classes).
Expect further revelations from other experts :)

Cannot assign type 'myFuncDef *' to 'void (*)(double *, double *)'

I am struggling to use the concept of pointer in my cython code. The following example is the simplified version of what I am trying to do. I have a function func which I would like to feed in a function (a distribution function) as an input parameter. The distribution has two pointer vectors as input variables.
from cpython cimport array
import cython
import ctypes
cimport numpy as np
ctypedef void (*myFuncDef)(double *, double *)
from cython.parallel import prange
cdef void func(int* x, double* hx, void(*func)(double*, double*), int n):
def int i
for i from 0 <= i < n:
func[0](&x[i], &hx[i])
return
cpdef void Lognormal(double* u,
double* yu):
#evaluate log of normal distribution
yu=-u*u*0.5
return
def foo(np.ndarray[ndim=1, dtype=np.float64_t] x,
np.ndarray[ndim=1, dtype=np.float64_t] hx,
myFuncDef distribution, int k):
cdef np.ndarray[ndim=1, dtype=np.float64_t] sp
cdef int num=len(x)
cdef int j
for j from 0 <= j <k:
func(&x[0],&hx[0],&distribution, num)
sp[j]=hx[0]
return sp
So I would like to used the Lognormal function as an input for foo function. I get the following error message:
Cannot assign type 'myFuncDef *' to 'void (*)(double *, double *)'
I will appreciate for any suggestion to fix this bug.
You have one too many layers of pointer (the standard issue raised to argue against typedefs for pointers). Just drop the & from &distribution; it's already the function pointer you want.

How to structure a cython module with many identical function signatures

I've written a Cython module which wraps a foreign C function, and it's working as expected. However, I'd like to wrap the rest of the functions provided by my C binary, which have identical signatures. In Python, I could just do:
def boilerplate(func):
def wrapped_f(c, d):
# modify x and y, producing mod_c and mod_d
result = func(mod_c, mod_d)
# modify foreign function return values, producing final_c, final_d
return final_c, final_d
return wrapped_f
#boilerplate
def func_a(a, b)
return _foreign_func_a(a, b)
#boilerplate
def func_b(a, b)
return _foreign_func_b(a, b)
Is there a similar pattern I can use in Cython, in order to "cythonise" wrapped_f, assuming _foreign_func_a and its accompanying structs etc. have been cimported?
However, when I move the generic operations into the decorator:
def boilerplate(func):
def wrapped(double[::1] wlon, double[::1] wlat):
cdef _FFIArray x_ffi, y_ffi
x_ffi.data = <void*>&wlon[0]
x_ffi.len = wlon.shape[0]
y_ffi.data = <void*>&wlat[0]
y_ffi.len = wlat.shape[0]
cdef _Result_Tuple result = func(x_ffi, y_ffi)
cdef double* eastings_ptr = <double*>(result.e.data)
cdef double* northings_ptr = <double*>(result.n.data)
cdef double[::1] e = <double[:result.e.len:1]>eastings_ptr
cdef double[::1] n = <double[:result.n.len:1]>northings_ptr
e_numpy = np.copy(e)
n_numpy = np.copy(n)
drop_float_array(result.e, result.n)
return e_numpy, n_numpy
return wrapped
#boilerplate
def convert_bng(double[::1] lons, double[::1] lats):
"""wrapper around threaded conversion function
"""
return convert_to_bng_threaded(lons, lats)
I get errors when
trying to convert x_ffi and y_ffi to _FFIArray to Python objects in wrapped
converting Python object func to _Result_Tuple in wrapped
converting lons and lats to _FFI_Array in convert_to_bng_threaded, and
converting _Result_Tuple back to a Python object in convert_bng_threaded
Your essential problem (based on your updated question) is that you're trying to wrap a function that takes pure C data types (and thus can only be defined as a cdef function, and can be called from Cython but not Python). However, decorators work on Python functions, so it doesn't quite come together.
Fortunately you can do something very similar handling the wrapped function a using C function pointer. You need a slightly different syntax but the idea is very much the same. (For the sake of this answer I'm assuming you are using the definitions of C data types from this previous question, which I think is reasonable)
# pass a function pointer in
cdef boilerplate(_Result_Tuple (*func)(_FFIArray, _FFIArray)):
def wrapped(double[::1] wlon, double[::1] wlat):
cdef _FFIArray x_ffi, y_ffi
x_ffi.data = <void*>&wlon[0]
x_ffi.len = wlon.shape[0]
y_ffi.data = <void*>&wlat[0]
y_ffi.len = wlat.shape[0]
cdef _Result_Tuple result = func(x_ffi, y_ffi)
cdef double* eastings_ptr = <double*>(result.e.data)
cdef double* northings_ptr = <double*>(result.n.data)
cdef double[::1] e = <double[:result.e.len:1]>eastings_ptr
cdef double[::1] n = <double[:result.n.len:1]>northings_ptr
e_numpy = np.copy(e)
n_numpy = np.copy(n)
drop_float_array(result.e, result.n)
return e_numpy, n_numpy
return wrapped
# do this instead of using a decorator syntax
convert_bng = boilerplate(&convert_to_bng_threaded)

Can I use this parallel iterator pattern with Cython?

With C++11 I have been using the following pattern for implementing a graph data structure with parallel iterators. Nodes are just indices, edges are entries in an adjacency data structure. For iterating over all nodes, a function (lambda, closure...) is passed to a parallelForNodes method and called with each node as an argument. Iteration details are nicely encapsulated in the method.
Now I would like to try the same concept with Cython. Cython provides the cython.parallel.prange function which uses OpenMP for parallelizing a loop over a range. For parallelism to work, Python's Global Interpreter Lock needs to be deactivated with the nogil=True parameter. Without the GIL, using Python objects is not allowed, which makes this tricky.
Is it possible to use this approach with Cython?
class Graph:
def __init__(self, n=0):
self.n = n
self.m = 0
self.z = n # max node id
self.adja = [[] for i in range(self.z)]
self.deg = [0 for i in range(self.z)]
def forNodes(self, handle):
for u in range(self.z):
handle(u)
def parallelForNodes(self, handle):
# first attempt which will fail...
for u in prange(self.z, nogil=True):
handle(u)
# usage
def initialize(u):
nonlocal ls
ls[u] = 1
G.parallelForNodes(initialize)
Firstly, things cannot be Python objects without the GIL.
from cython.parallel import prange
cdef class Graph:
cdef int n, m, z
def __cinit__(self, int n=0):
self.z = n # max node id
cdef void parallelForNodes(self, void (*handle)(int) nogil) nogil:
cdef int u
for u in prange(self.z, nogil=True):
handle(u)
The biggest catch there is that our function pointer was also nogil.
parallelForNodes does not have to be nogil itself, but there's no reason for it not to be.
Then we need a C function to call:
cdef int[100] ls
cdef void initialize(int u) nogil:
global ls
ls[u] = 1
and it just works!
Graph(100).parallelForNodes(initialize)
# Print it!
cdef int[:] ls_ = ls
print(list(ls_))

Categories

Resources