I am trying to write a Cython extension to CPython to wrap the mcrypt library, so that I can use it with Python 3. However, I am running into a problem where I segfault while trying to use one of the mcrypt APIs.
The code that is failing is:
def _real_encrypt(self, source):
src_len = len(source)
cdef char* ciphertext = source
cmc.mcrypt_generic(self._mcStream, <void *>ciphertext, src_len)
retval = source[:src_len]
return retval
Now, the way I understand the Cython documentation, the assignment on line 3 should copy the contents of the buffer (a object in Python 3) to the C string pointer. I would figure that this would also mean that it would allocate the memory, but when I made this modification:
def _real_encrypt(self, source):
src_len = len(source)
cdef char* ciphertext = <char *>malloc(src_len)
ciphertext = source
cmc.mcrypt_generic(self._mcStream, <void *>ciphertext, src_len)
retval = source[:src_len]
return retval
it still crashed with a segfault. It's crashing inside of mcrypt_generic, but when I use plain C code I am able to make it work just fine, so there has to be something that I am not quite understanding about how Cython is working with C data here.
Thanks for any help!
ETA: The problem was a bug on my part. I was working on this after being awake for far too many hours (isn't that something we've all done at some point?) and missed something stupid. The code that I now have, which works, is:
def _real_encrypt(self, source):
src_len = len(source)
cdef char *ciphertext = <char *>malloc(src_len)
cmc.strncpy(ciphertext, source, src_len)
cmc.mcrypt_generic_init(self._mcStream, <void *>self._key,
len(self._key), NULL)
cmc.mcrypt_generic(self._mcStream, <void *>ciphertext,
src_len)
retval = ciphertext[:src_len]
cmc.mcrypt_generic_deinit(self._mcStream)
return retval
It's probably not the most efficient code in the world, as it makes a copy to do the encryption and then a second copy to the return value. I'm not sure if it is possible to avoid that, though, since I'm not sure if it is possible to take a newly-allocated buffer and return it to Python in-place as a bytestring. But now that I have a working function, I'm going to implement a block-by-block method as well, so that one can provide an iterable of blocks for encryption or decryption, and be able to do it without having the entire source and destination all in memory all at once---that way, it'd be possible to encrypt/decrypt huge files without having to worry about holding up to three copies of it in memory at any one point...
Thanks for the help, everyone!
The first one is pointing the char* at the Python string. The second allocates memory, but then re-points the pointer to the Python string and ignores the newly allocated memory. You should be invoking the C library function strcpy from Cython, presumably; but I don't know the details.
A few comments on your code to help improve it, IMHO. There are functions provided by the python C API that do exactly what you need to do, and make sure everything conforms to the Python way of doing things. It will handle embedded NULL's without a problem.
Rather than calling malloc directly, change this:
cdef char *ciphertext = <char *>malloc(src_len)
to
cdef str retval = PyString_FromStringAndSize(PyString_AsString(source), <Py_ssize_t>src_len)
cdef char *ciphertext = PyString_AsString(retval)
The above lines will create a brand new Python str object initialized to the contents of source. The second line points ciphertext to retval's internal char * buffer without copying. Whatever modifies ciphertext will modify retval. Since retval is a brand new Python str, it can be modified by C code before being returned from _real_encrypt.
See the Python C/API docs on the above functions for more details, here and here.
The net effect saves you a copy. The whole code would be something like:
cdef extern from "Python.h":
object PyString_FromStringAndSize(char *, Py_ssize_t)
char *PyString_AsString(object)
def _real_encrypt(self, source):
src_len = len(source)
cdef str retval = PyString_FromStringAndSize(PyString_AsString(source), <Py_ssize_t>src_len)
cdef char *ciphertext = PyString_AsString(retval)
cmc.mcrypt_generic_init(self._mcStream, <void *>self._key,
len(self._key), NULL)
cmc.mcrypt_generic(self._mcStream, <void *>ciphertext,
src_len)
# since the above initialized ciphertext, the retval str is also correctly initialized, too.
cmc.mcrypt_generic_deinit(self._mcStream)
return retval
The approach I've used (with Python 2.x) is to declare the string type parameters in the function signature so that the Cython code does all conversions and type checking automatically:
def _real_encrypt(self,char* src):
...
Related
We would need to create a PyCapsule from a method of a class in Cython. We managed to write a code which compiles and even runs without error but the results are wrong.
A simple example is here: https://github.com/paugier/cython_capi/tree/master/using_cpython_pycapsule_class
The capsules are executed by Pythran (one needs to use the version on github https://github.com/serge-sans-paille/pythran).
The .pyx file:
from cpython.pycapsule cimport PyCapsule_New
cdef int twice_func(int c):
return 2*c
cdef class Twice:
cdef public dict __pyx_capi__
def __init__(self):
self.__pyx_capi__ = self.get_capi()
cpdef get_capi(self):
return {
'twice_func': PyCapsule_New(
<void *>twice_func, 'int (int)', NULL),
'twice_cpdef': PyCapsule_New(
<void *>self.twice_cpdef, 'int (int)', NULL),
'twice_cdef': PyCapsule_New(
<void *>self.twice_cdef, 'int (int)', NULL),
'twice_static': PyCapsule_New(
<void *>self.twice_static, 'int (int)', NULL)}
cpdef int twice_cpdef(self, int c):
return 2*c
cdef int twice_cdef(self, int c):
return 2*c
#staticmethod
cdef int twice_static(int c):
return 2*c
The file compiled by pythran (call_capsule_pythran.py).
# pythran export call_capsule(int(int), int)
def call_capsule(capsule, n):
r = capsule(n)
return r
Once again it is a new feature of Pythran so one needs the version on github...
And the test file:
try:
import faulthandler
faulthandler.enable()
except ImportError:
pass
import unittest
from twice import Twice
from call_capsule_pythran import call_capsule
class TestAll(unittest.TestCase):
def setUp(self):
self.obj = Twice()
self.capi = self.obj.__pyx_capi__
def test_pythran(self):
value = 41
print('\n')
for name, capsule in self.capi.items():
print('capsule', name)
result = call_capsule(capsule, value)
if name.startswith('twice'):
if result != 2*value:
how = 'wrong'
else:
how = 'good'
print(how, f'result ({result})\n')
if __name__ == '__main__':
unittest.main()
It is buggy and gives:
capsule twice_func
good result (82)
capsule twice_cpdef
wrong result (4006664390)
capsule twice_cdef
wrong result (4006664390)
capsule twice_static
good result (82)
It shows that it works fine for the standard function and for the static function but that there is a problem for the methods.
Note that the fact that it works for two capsules seems to indicate that the problem does not come from Pythran.
Edit
After DavidW's comments, I understand that we would have to create at run time (for example in get_capi) a C function with the signature int(int) from the bound method twice_cdef whose signature is actually int(Twice, int).
I don't know if this is really impossible to do with Cython...
To follow up/expand on my comments:
The basic issue is that the Pythran is expecting a C function pointer with the signature int f(int) to be contained within the PyCapsule. However, the signature of your methods is int(PyObject* self, int c). The 2 gets passed as self (not causing disaster since it isn't actually used...) and some arbitrary bit of memory is used in place of the int c. Unfortunately it isn't possible to use pure C code to create a C function pointer with "bound arguments" so Cython can't (and realistically won't be able to) do it.
Modification 1 is to get better compile-time type checking of what you're passing to your PyCapsules by creating a function that accepts the correct types and casting in there, rather than just casting to <void*> blindly. This doesn't solve your problem but warns you at compile-time when it isn't going to work:
ctypedef int(*f_ptr_type)(int)
cdef make_PyCapsule(f_ptr_type f, string):
return PyCapsule_New(
<void *>f, string, NULL)
# then in get_capi:
'twice_func': make_PyCapsule(twice_func, b'int (int)'), # etc
It is actually possible to create C function from arbitrary Python callables using ctypes (or cffi) - see Using function pointers to methods of classes without the gil (bottom of answer). This adds an extra layer of Python calls so isn't terribly quick, and the code is a bit messy. ctypes achieves this by using runtime code generation (which isn't that portable or something you can do in pure C) to build a function on the fly and then create a pointer to that.
Although you claim in the comments that you don't think you can use the Python interpreter, I don't think this is true - Pythran generates Python extension modules (so is pretty bound to the Python interpreter) and it seems to work in your test case shown here:
_func_cache = []
cdef f_ptr_type py_to_fptr(f):
import ctypes
functype = ctypes.CFUNCTYPE(ctypes.c_int,ctypes.c_int)
ctypes_f = functype(f)
_func_cache.append(ctypes_f) # ensure references are kept
return (<f_ptr_type*><size_t>ctypes.addressof(ctypes_f))[0]
# then in make_capi:
'twice_cpdef': make_PyCapsule(py_to_fptr(self.twice_cpdef), b'int (int)')
Unfortunately it only works for cpdef and not cdef functions since it does rely on having a Python callable. cdef functions can be made to work with a lambda (provided you change get_capi to def instead of cpdef):
'twice_cdef': make_PyCapsule(py_to_fptr(lambda x: self.twice_cdef(x)), b'int (int)'),
It's all a little messy but can be made to work.
This (I hope) is quite a simple issue, but despite doing some reading (I'm v. new to SWIG, and fairly green C-wise) I'm just not able to make the "connection" in my head.
I have a function from a library (legacy code, keen not to edit):
extern int myfunction(char *infile, char *maskfile, int check, float *median, char *msg)
My aim is to create a wrapper for this in Python using SWIG.
The values of the median and msg variables are changed by the C function. When the return int != 0 then there will be some error information in the msg arg. Where the return int == 0, then median variable will contain a float with value assigned from myfunction.
This generally runs OK where the return value is 0. I use %array_functions and %pointer_functions to create the pointers needing to be passed, as per this .i file:
%module test
%include "cpointer.i"
%include "carrays.i"
%{
#include <stdint.h>
%}
extern int myfunction(char *infile, char *maskfile, int check, float *median, char *msg)
%pointer_functions(float, floatp);
%pointer_functions(char, charp);
%array_functions(char, charArray);
After swig-ing, compiling and linking, I can call the function in python:
import test
errmsg_buffer = 1024
_infile = 'test2.dat'
infile = imstat.new_charArray(len(_infile))
for i in xrange(len(_infile)):
imstat.charArray_setitem(infile,i,_infile[i])
maskfile = imstat.new_charArray(1)
imstat.charArray_setitem(maskfile,0,'')
check = 0
med = imstat.new_floatp()
errmsg = imstat.new_charArray(errmsg_buffer)
out = test.myfunction(infile,maskfile,check,med,errmsg)
median = test.floatp_value(med)
This works sometimes, but often not - I get a lot of segfaults which are generally fixed by changing the errmsg_buffer length (clearly not a useful fix!). The C code that changes the msg string is:
(void)sprintf(errmsg,"file not found");
My main issue is in proper handling of msg string, which I suspect is causing the segfaults (and might be due to incorrect implementation via new_charArray?).
What is the best way to do this?
Can I add something to the .i that converts the char *msg into a python str?
Can this be done without "pre-initialising" with new_CharArray? I'd presumably get a buffer overflow if errmsg_buffer is too small.
I hope this is clear - happy to add comments for further discussion.
Your wrapper can be much simplified using SWIG. Try this SWIG interface file (details below):
%module test
%include "typemaps.i"
%include "cstring.i"
%apply float *OUTPUT { float *median };
%cstring_bounded_output(char *msg, 1024);
extern int myfunction(char *infile, char *maskfile, int check, float *median, char *msg);
Then, from python, use the module in the following way:
import test
infile = 'test2.dat'
maskfile = ''
check = 0
out, median, errmsg = test.myfunction(infile,maskfile,check)
if out == 0: print(errmsg)
...
However, from what you write, it is not quite clear to me why your approach segfaults.
Details
The typemaps.i file contains the float *OUTPUT typemap, which is then applied to the float *median argument and turns this from an argument into a float output value. See the SWIG docs on argument handling for details.
The cstrings.i file contains SWIG macros to deal with C strings. Here, I used the %cstring_bounded_output macro. This creates a char * buffer of the given size 1024 and passes this as the argument for char *msg automatically. Then, the contents after the function complete are converted into a python string and appended to the output. See here for details.
SWIG handles the first two char * arguments by default, that is converting python strings to appropriate char * and passing these. Note that the passed char * for these arguments are immutable, i.e., if your myfunction attempts to modify these, bad things will happen. Read about how SWIG handles C strings here.
So, your wrapped myfunction then is used as shown above and has the following signature in python:
myfunction(infile, maskfile, check) -> (out, median, msg)
EDIT:
The SWIG docs about carrays.i state:
Note: %array_functions() and %array_class() should not be used with types of char or char *.
I think your code is not creating correctly NULL-terminated C char *, so perhaps this could be causing the segfaults.
I am not learn SWIG very deeply.But I try give you some suggestions.
1.
If your program modifies the input parameter or uses it to return data, consider using the cstring.i library file described in the SWIG Library chapter.
Data is copied into a new Python string and returned.
If your program needs to work with binary data, you can use a typemap to expand a Python string into a pointer/length argument pair. As luck would have it, just such a typemap is already defined. Just do this:
%apply (char *STRING, int LENGTH) { (char *data, int size) };
...
int parity(char *data, int size, int initial);
Python:
parity("e\x09ffss\x00\x00\x01\nx", 0)
If you need to return binary data, you might use the cstring.i library file. The cdata.i library can also be used to extra binary data from arbitrary pointers.
2.I think "pre-initialising" maybe necessary.
I was writing code to store a (potentially) very large integer value into an array of chars referenced by a pointer. My code looks like this:
cdef class Variable:
cdef unsigned int Length
cdef char * Array
def __cinit__(self, var, length):
self.Length = length
self.Array = <char *>malloc(self.Length * sizeof(char)) # Error
for i in range(self.Length):
self.Array[i] = <char>(var >> (8 * i))
def __dealloc__(self):
self.Array = NULL
When I tried compiling the code, I got the error, "Storing unsafe C derivative of temporary Python reference" at the commented line. My question is this: which temporary Python reference am I deriving in C and storing, and how do I fix it?
The problem is that under the hood a temporary variable is being created to hold the array before the assignment to self.Array and it will not be valid once the method exits.
Note that the documentation advises:
the C-API functions for allocating memory on the Python heap are generally preferred over the low-level C functions above as the memory they provide is actually accounted for in Python’s internal memory management system. They also have special optimisations for smaller memory blocks, which speeds up their allocation by avoiding costly operating system calls.
Accordingly you can write as below, which seems to handle this use case as intended:
from cpython.mem cimport PyMem_Malloc, PyMem_Realloc, PyMem_Free
cdef class Variable:
cdef unsigned int Length
cdef char * Array
def __cinit__(self, var,size_t length):
self.Length = length
self.Array = <char *>PyMem_Malloc(length * sizeof(char))
#as in docs, a good practice
if not self.Array:
raise MemoryError()
for i in range(self.Length):
self.Array[i] = <char>(var >> (8 * i))
def __dealloc__(self):
PyMem_Free(self.Array)
#rll's answer does a pretty good job of cleaning up the code and "doing everything properly" (most importantly the deallocation of memory which was missing from __dealloc__ in the question!).
The actual issue causing the error is that you haven't cimported malloc. Because of this Cython assumes that malloc is a Python function, returning a Python object which you want to cast to a char*. At the top of your file, add
from libc.stdlib cimport malloc, free
and it'll work. Or alternatively use PyMem_Malloc (cimporting it) as #rll does and that will work fine too.
I am writing a Cython wrapper around a C library we are maintaining. I am getting the following error message:
analog.pyx:6:66: Cannot convert 'unsigned short (*)' to Python object
Here's the code I am trying to write:
cimport company as lib
def get_value(device, channel, index):
cdef unsigned short aValue
err = library_get_data_val(device, channel, index, &aValue) # line 6
# Ignore the err return value for StackOverflow.
return aValue
The prototype of the C function I am trying to use is:
unsigned long library_get_data_val(unsigned long device, int channel,
int index, unsigned short *pValue);
The library function returns the requested value in the aValue parameter. It's just an unsigned short primitive. What's the expected way of returning primitives (i.e. not struct) from these type of functions? I am new to Cython so the answer may be quite simple but I didn't see anything obvious through Google.
I think the problem is that you haven't defined library_get_data_val properly, so Cython thinks it's a Python type function you're calling, and doesn't know what to do with the pointer to aValue
Try:
cdef extern from "header_containing_library_get_data_val.h":
# I've taken a guess at the signature of library_get_data_val
# Update it to match reality
int library_get_data_val(int device, int channel, int index, int* value)
That way Cython knows it's a C-function that expects a pointer, and will be happy.
(Edited to be significantly changed from my original answer, where I misunderstood the problem!)
I found out what my problem was. You can probably tell what it is now that I've edited the question. There's a company.pxd file that's cimported by the .pyx file. Once I copied the C prototype into company.pxd it worked.
I also needed to use the lib prefix in my call:
err = lib.library_get_data_val(device, channel, index, &aValue) # line 6
I am running to some problems and would like some help. I have a piece code, which is used to embed a python script. This python script contains a function which will expect to receive an array as an argument (in this case I am using numpy array within the python script).
I would like to know how can I pass an array from C to the embedded python script as an argument for the function within the script. More specifically can someone show me a simple example of this.
Really, the best answer here is probably to use numpy arrays exclusively, even from your C code. But if that's not possible, then you have the same problem as any code that shares data between C types and Python types.
In general, there are at least five options for sharing data between C and Python:
Create a Python list or other object to pass.
Define a new Python type (in your C code) to wrap and represent the array, with the same methods you'd define for a sequence object in Python (__getitem__, etc.).
Cast the pointer to the array to intptr_t, or to explicit ctypes type, or just leave it un-cast; then use ctypes on the Python side to access it.
Cast the pointer to the array to const char * and pass it as a str (or, in Py3, bytes), and use struct or ctypes on the Python side to access it.
Create an object matching the buffer protocol, and again use struct or ctypes on the Python side.
In your case, you want to use numpy.arrays in Python. So, the general cases become:
Create a numpy.array to pass.
(probably not appropriate)
Pass the pointer to the array as-is, and from Python, use ctypes to get it into a type that numpy can convert into an array.
Cast the pointer to the array to const char * and pass it as a str (or, in Py3, bytes), which is already a type that numpy can convert into an array.
Create an object matching the buffer protocol, and which again I believe numpy can convert directly.
For 1, here's how to do it with a list, just because it's a very simple example (and I already wrote it…):
PyObject *makelist(int array[], size_t size) {
PyObject *l = PyList_New(size);
for (size_t i = 0; i != size; ++i) {
PyList_SET_ITEM(l, i, PyInt_FromLong(array[i]));
}
return l;
}
And here's the numpy.array equivalent (assuming you can rely on the C array not to be deleted—see Creating arrays in the docs for more details on your options here):
PyObject *makearray(int array[], size_t size) {
npy_int dim = size;
return PyArray_SimpleNewFromData(1, &dim, (void *)array);
}
At any rate, however you do this, you will end up with something that looks like a PyObject * from C (and has a single refcount), so you can pass it as a function argument, while on the Python side it will look like a numpy.array, list, bytes, or whatever else is appropriate.
Now, how do you actually pass function arguments? Well, the sample code in Pure Embedding that you referenced in your comment shows how to do this, but doesn't really explain what's going on. There's actually more explanation in the extending docs than the embedding docs, specifically, Calling Python Functions from C. Also, keep in mind that the standard library source code is chock full of examples of this (although some of them aren't as readable as they could be, either because of optimization, or just because they haven't been updated to take advantage of new simplified C API features).
Skip the first example about getting a Python function from Python, because presumably you already have that. The second example (and the paragraph right about it) shows the easy way to do it: Creating an argument tuple with Py_BuildValue. So, let's say we want to call a function you've got stored in myfunc with the list mylist returned by that makelist function above. Here's what you do:
if (!PyCallable_Check(myfunc)) {
PyErr_SetString(PyExc_TypeError, "function is not callable?!");
return NULL;
}
PyObject *arglist = Py_BuildValue("(o)", mylist);
PyObject *result = PyObject_CallObject(myfunc, arglist);
Py_DECREF(arglist);
return result;
You can skip the callable check if you're sure you've got a valid callable object, of course. (And it's usually better to check when you first get myfunc, if appropriate, because you can give both earlier and better error feedback that way.)
If you want to actually understand what's going on, try it without Py_BuildValue. As the docs say, the second argument to [PyObject_CallObject][6] is a tuple, and PyObject_CallObject(callable_object, args) is equivalent to apply(callable_object, args), which is equivalent to callable_object(*args). So, if you wanted to call myfunc(mylist) in Python, you have to turn that into, effectively, myfunc(*(mylist,)) so you can translate it to C. You can construct a tuple like this:
PyObject *arglist = PyTuple_Pack(1, mylist);
But usually, Py_BuildValue is easier (especially if you haven't already packed everything up as Python objects), and the intention in your code is clearer (just as using PyArg_ParseTuple is simpler and clearer than using explicit tuple functions in the other direction).
So, how do you get that myfunc? Well, if you've created the function from the embedding code, just keep the pointer around. If you want it passed in from the Python code, that's exactly what the first example does. If you want to, e.g., look it up by name from a module or other context, the APIs for concrete types like PyModule and abstract types like PyMapping are pretty simple, and it's generally obvious how to convert Python code into the equivalent C code, even if the result is mostly ugly boilerplate.
Putting it all together, let's say I've got a C array of integers, and I want to import mymodule and call a function mymodule.myfunc(mylist) that returns an int. Here's a stripped-down example (not actually tested, and no error handling, but it should show all the parts):
int callModuleFunc(int array[], size_t size) {
PyObject *mymodule = PyImport_ImportModule("mymodule");
PyObject *myfunc = PyObject_GetAttrString(mymodule, "myfunc");
PyObject *mylist = PyList_New(size);
for (size_t i = 0; i != size; ++i) {
PyList_SET_ITEM(l, i, PyInt_FromLong(array[i]));
}
PyObject *arglist = Py_BuildValue("(o)", mylist);
PyObject *result = PyObject_CallObject(myfunc, arglist);
int retval = (int)PyInt_AsLong(result);
Py_DECREF(result);
Py_DECREF(arglist);
Py_DECREF(mylist);
Py_DECREF(myfunc);
Py_DECREF(mymodule);
return retval;
}
If you're using C++, you probably want to look into some kind of scope-guard/janitor/etc. to handle all those Py_DECREF calls, especially once you start doing proper error handling (which usually means early return NULL calls peppered through the function). If you're using C++11 or Boost, unique_ptr<PyObject, Py_DecRef> may be all you need.
But really, a better way to reduce all that ugly boilerplate, if you plan to do a lot of C<->Python communication, is to look at all of the familiar frameworks designed for improving extending Python—Cython, boost::python, etc. Even though you're embedding, you're effectively doing the same work as extending, so they can help in the same ways.
For that matter, some of them also have tools to help the embedding part, if you search around the docs. For example, you can write your main program in Cython, using both C code and Python code, and cython --embed. You may want to cross your fingers and/or sacrifice some chickens, but if it works, it's amazingly simple and productive. Boost isn't nearly as trivial to get started, but once you've got things together, almost everything is done in exactly the way you'd expect, and just works, and that's just as true for embedding as extending. And so on.
The Python function will need a Python object to be passed in. Since you want that Python object to be a NumPy array, you should use one of the NumPy C-API functions for creating arrays; PyArray_SimpleNewFromData() is probably a good start. It will use the buffer provided, without copying the data.
That said, it is almost always easier to write the main program in Python and use a C extension module for the C code. This approach makes it easier to let Python do the memory management, and the ctypes module together with Numpy's cpython extensions make it easy to pass a NumPy array to a C function.