I've written a Python C++ extension, however I have a problem with one of its functions.
The function provided by this extension takes 2 arrays as inputs and produces one as an output.
I've only left the relevant part of function's code
float* forward(float* input, float* kernels, npy_intp* input_dims, npy_intp* kernels_dims){
float* output = new float[output_size];
//some irrelevant matrix operation code
return output;
}
And the wrapper:
static PyObject *module_forward(PyObject *self, PyObject *args)
{
PyObject *input_obj, *kernels_obj;
if (!PyArg_ParseTuple(args, "OO", &input_obj, &kernels_obj))
return NULL;
PyObject *input_array = PyArray_FROM_OTF(input_obj, NPY_FLOAT, NPY_IN_ARRAY);
PyObject *kernels_array = PyArray_FROM_OTF(kernels_obj, NPY_FLOAT, NPY_IN_ARRAY);
if (input_array == NULL || kernels_array == NULL) {
Py_XDECREF(input_array);
Py_XDECREF(kernels_array);
return NULL;
}
float *input = (float*)PyArray_DATA(input_array);
float *kernels = (float*)PyArray_DATA(kernels_array);
npy_intp *input_dims = PyArray_DIMS(input_array);
npy_intp *kernels_dims = PyArray_DIMS(kernels_array);
/////////THE ACTUAL FUNCTION
float* output = forward(input, kernels, input_dims, kernels_dims);
Py_DECREF(input_array);
Py_DECREF(kernels_array);
npy_intp output_dims[4] = {input_dims[0], input_dims[1]-kernels_dims[0]+1, input_dims[2]-kernels_dims[1]+1, kernels_dims[3]};
PyObject* ret_output = PyArray_SimpleNewFromData(4, output_dims, NPY_FLOAT, output);
delete output;//<-----THE PROBLEMATIC LINE////////////////////////////
PyObject *ret = Py_BuildValue("O", ret_output);
Py_DECREF(ret_output);
return ret;
}
The delete operator that I highlighted is where the magic happens: without it this function leaks memory, with it it crashes because of memory access violation.
The fun thing is I wrote another method, that returns two arrays. So the function returns a float** pointing to two float* elements:
float** gradients = backward(input, kernels, grads, input_dims, kernel_dims, PyArray_DIMS(grads_array));
Py_DECREF(input_array);
Py_DECREF(kernels_array);
Py_DECREF(grads_array);
PyObject* ret_g_input = PyArray_SimpleNewFromData(4, input_dims, NPY_FLOAT, gradients[0]);
PyObject* ret_g_kernels = PyArray_SimpleNewFromData(4, kernel_dims, NPY_FLOAT, gradients[1]);
delete gradients[0];
delete gradients[1];
delete gradients;
PyObject* ret_list = PyList_New(0);
PyList_Append(ret_list, ret_g_input);
PyList_Append(ret_list, ret_g_kernels);
PyObject *ret = Py_BuildValue("O", ret_list);
Py_DECREF(ret_g_input);
Py_DECREF(ret_g_kernels);
return ret;
Notice that the second example works flawlessly, no crashes or memory leaks, while still calling delete on arrays after they have been built into PyArray objects.
Could someone enlighten me about what's going on in here?
From the PyArray_SimpleNewFromData docs:
Create an array wrapper around data pointed to by the given pointer.
If you create an array with PyArray_SimpleNewFromData, it's going to create a wrapper around the data you give it, rather than making a copy. That means the data it wraps has to outlive the array. delete-ing the data violates that.
You have several options:
You could create the array differently so you don't just make a wrapper around the original data.
You could carefully control access to the array and make sure its lifetime ends before you delete the data.
You could create a Python object that owns the data and will delete the data when the object's lifetime ends, and set the array's base to that object with PyArray_SetBaseObject, so the array keeps the owner object alive until the array itself dies.
Related
I'm venturing into C extensions for the first time, and am somewhat new to C as well. I've got a working C extension, however, if i repeatedly call the utility in python, I eventually get a segmentation fault: 11.
#include <Python.h>
static PyObject *getasof(PyObject *self, PyObject *args) {
PyObject *fmap;
long dt;
if (!PyArg_ParseTuple(args, "Ol", &fmap, &dt))
return NULL;
long length = PyList_Size(fmap);
for (int i = 0; i < length; i++) {
PyObject *event = PyList_GetItem(fmap, i);
long dti = PyInt_AsLong(PyList_GetItem(event, 0));
if (dti > dt) {
PyObject *output = PyList_GetItem(event, 1);
return output;
}
}
Py_RETURN_NONE;
};
The function args are
a time series (list of lists): ex [[1, 'a'], [5, 'b']]
a time (long): ex 4
And it's supposed to iterate over the list of lists til it finds a value greater than the time given. Then return that value. As I mentioned, it correctly returns the answer, but if I call it enough times, it segfaults.
My gut feeling is that this has to do with reference counting, but I'm not familiar enough with the concept to know if this is the direct cause.
Any help would be appreciated.
"My gut feeling is that this has to do with reference counting..." Your instincts are correct.
PyList_GetItem returns a borrowed reference, which means your function doesn't "own" a reference to the item. So there is a problem here:
PyObject *output = PyList_GetItem(event, 1);
return output;
You don't own a reference to the item, but you return it to the caller, so the caller doesn't own a reference either. The caller will run into a problem if the item is garbage collected while the caller is still trying to use it. So you'll need to increase the reference count of the item before you return it:
PyObject *output = PyList_GetItem(event, 1);
Py_INCREF(output);
return output;
That assumes that PyList_GetItem(event, 1) doesn't fail! Except for PyArg_ParseTuple, you aren't checking the return values of the C API functions, which means you are assuming the input argument always has the exact structure that you expect. That's fine while you're testing code and figuring out how this works, but eventually you should be checking the return values of the C API functions for failure, and handling it appropriately.
This question already has answers here:
Does Python have a stack/heap and how is memory managed?
(2 answers)
Closed 5 years ago.
Can anyone explain me how these python dictionary, list variables stored in memory. I do know that in python memory management is done using heap and stack. But I really couldn't find a simple explanation on how memory is allocated when dictionary variable is created, is it created in stack frame or heap space?
Well lets look at the source code to figure it out!
From: https://github.com/python/cpython/blob/master/Objects/dictobject.c
static PyObject *
new_dict(PyDictKeysObject *keys, PyObject **values)
{
PyDictObject *mp;
...
mp = PyObject_GC_New(PyDictObject, &PyDict_Type);
...
return (PyObject *)mp;
}
So a new dict object appears to be allocated using PyObject_GC_New()
From: https://github.com/python/cpython/blob/master/Doc/c-api/gcsupport.rst#id9
.. c:function:: TYPE* PyObject_GC_New(TYPE, PyTypeObject *type)
Analogous to :c:func:`PyObject_New` but for container objects with the
:const:`Py_TPFLAGS_HAVE_GC` flag set.
From: https://github.com/python/cpython/blob/master/Objects/object.c
PyObject *
_PyObject_New(PyTypeObject *tp)
{
PyObject *op;
op = (PyObject *) PyObject_MALLOC(_PyObject_SIZE(tp));
if (op == NULL)
return PyErr_NoMemory();
return PyObject_INIT(op, tp);
}
From: https://github.com/python/cpython/blob/master/Objects/obmalloc.c
#define MALLOC_ALLOC {NULL, _PyMem_RawMalloc, _PyMem_RawCalloc, _PyMem_RawRealloc, _PyMem_RawFree}
#ifdef WITH_PYMALLOC
# define PYMALLOC_ALLOC {NULL, _PyObject_Malloc, _PyObject_Calloc, _PyObject_Realloc, _PyObject_Free}
#endif
#define PYRAW_ALLOC MALLOC_ALLOC
#ifdef WITH_PYMALLOC
# define PYOBJ_ALLOC PYMALLOC_ALLOC
#else
# define PYOBJ_ALLOC MALLOC_ALLOC
static PyMemAllocatorEx _PyObject = PYOBJ_ALLOC;
...
void *
PyObject_Malloc(size_t size)
{
/* see PyMem_RawMalloc() */
if (size > (size_t)PY_SSIZE_T_MAX)
return NULL;
return _PyObject.malloc(_PyObject.ctx, size);
}
I think its safe to assume at this point that these will call malloc, calloc, realloc, and free.
At this point, this is no longer a python question, but the answer is it is dependent on the OS as to whether malloc will allocate on the stack or the heap.
I am moderately experienced in python and C but new to writing python modules as wrappers on C functions. For a project I needed one function named "score" to run much faster than I was able to get in python so I coded it in C and literally just want to be able to call it from python. It takes in a python list of integers and I want the C function to get an array of integers, the length of that array, and then return an integer back to python. Here is my current (working) solution.
static PyObject *module_score(PyObject *self, PyObject *args) {
int i, size, value, *gene;
PyObject *seq, *data;
/* Parse the input tuple */
if (!PyArg_ParseTuple(args, "O", &data))
return NULL;
seq = PySequence_Fast(data, "expected a sequence");
size = PySequence_Size(seq);
gene = (int*) PyMem_Malloc(size * sizeof(int));
for (i = 0; i < size; i++)
gene[i] = PyInt_AsLong(PySequence_Fast_GET_ITEM(seq, i));
/* Call the external C function*/
value = score(gene, size);
PyMem_Free(gene);
/* Build the output tuple */
PyObject *ret = Py_BuildValue("i", value);
return ret;
}
This works but seems to leak memory and at a rate I can't ignore. I made sure that the leak is happening in the shown function by temporarily making the score function just return 0 and still saw the leaking behavior. I had thought that the call to PyMem_Free should take care of the PyMem_Malloc'ed storage but my current guess is that something in this function is getting allocated and retained on each call since the leaking behavior is proportional to the number of calls to this function. Am I not doing the sequence to array conversion correctly or am I possibly returning the ending value inefficiently? Any help is appreciated.
seq is a new Python object so you will need delete that object. You should check if seq is NULL, too.
Something like (untested):
static PyObject *module_score(PyObject *self, PyObject *args) {
int i, size, value, *gene;
long temp;
PyObject *seq, *data;
/* Parse the input tuple */
if (!PyArg_ParseTuple(args, "O", &data))
return NULL;
if (!(seq = PySequence_Fast(data, "expected a sequence")))
return NULL;
size = PySequence_Size(seq);
gene = (int*) PyMem_Malloc(size * sizeof(int));
for (i = 0; i < size; i++) {
temp = PyInt_AsLong(PySequence_Fast_GET_ITEM(seq, i));
if (temp == -1 && PyErr_Occurred()) {
Py_DECREF(seq);
PyErr_SetString(PyExc_ValueError, "an integer value is required");
return NULL;
}
/* Do whatever you need to verify temp will fit in an int */
gene[i] = (int*)temp;
}
/* Call the external C function*/
value = score(gene, size);
PyMem_Free(gene);
Py_DECREF(seq):
/* Build the output tuple */
PyObject *ret = Py_BuildValue("i", value);
return ret;
}
Refering to http://mail.python.org/pipermail/python-dev/2009-June/090210.html
AND http://dan.iel.fm/posts/python-c-extensions/
and here is other places i searched regarding my question:
http://article.gmane.org/gmane.comp.python.general/424736
http://joyrex.spc.uchicago.edu/bookshelves/python/cookbook/pythoncook-CHP-16-SECT-3.html
http://docs.python.org/2/c-api/sequence.html#PySequence_Check
Python extension module with variable number of arguments
I am inexperienced in Python/C API.
I have the following code:
sm_int_list = (1,20,3)
c_int_array = (ctypes.c_int * len(sm_int_list))(*sm_int_list)
sm_str_tuple = ('some','text', 'here')
On the C extension side, i have done something like this:
static PyObject* stuff_here(PyObject *self, PyObject *args)
{
char* input;
int *i1, *i2;
char *s1, *s2;
// args = (('some','text', 'here'), [1,20,3], ('some','text', 'here'), [1,20,3])
**PyArg_ParseTuple(args, "(s#:):#(i:)#(s#:):#(i:)#", &s1, &i1, &s2, &i2)**;
/*stuff*/
}
such that:
stuff.here(('some','text', 'here'), [1,20,3], ('some','text', 'here'), [1,20,3])
returns data in the same form as args after some computation.
I would like to know the PyArg_ParseTuple expression, is it the proper way to parse
an array of varying string
an array of integers
UPDATE NEW
Is this the correct way?:
static PyObject* stuff_here(PyObject *self, PyObject *args)
unsigned int tint[], cint[];
ttotal=0, ctotal=0;
char *tstr, *cstr;
int *t_counts, *c_counts;
Py_ssize_t size;
PyObject *t_str1, *t_int1, *c_str2, *c_int2; //the C var that takes in the py variable value
PyObject *tseq, cseq;
int t_seqlen=0, c_seqlen=0;
if (!PyArg_ParseTuple(args, "OOiOOi", &t_str1, &t_int1, &ttotal, &c_str2, &c_int2, &ctotal))
{
return NULL;
}
if (!PySequence_Check(tag_str1) && !PySequence_Check(cat_str2)) return NULL;
else:
{
//All things t
tseq = PySequence_Fast(t_str1, "iterable");
t_seqlen = PySequence_Fast_GET_SIZE(tseq);
t_counts = PySequence_Fast(t_int1);
//All things c
cseq = PySequence_Fast(c_str2);
c_seqlen = PySequence_Fast_GET_SIZE(cseq);
c_counts = PySequence_Fast(c_int2);
//Make c arrays of all things tag and cat
for (i=0; i<t_seqlen; i++)
{
tstr[i] = PySequence_Fast_GET_ITEM(tseq, i);
tcounts[i] = PySequence_Fast_GET_ITEM(t_counts, i);
}
for (i=0; i<c_seqlen; i++)
{
cstr[i] = PySequence_Fast_GET_ITEM(cseq, i);
ccounts[i] = PySequence_Fast_GET_ITEM(c_counts, i);
}
}
OR
PyArg_ParseTuple(args, "(s:)(i:)(s:)(i:)", &s1, &i1, &s2, &i2)
And then again while returning,
Py_BuildValue("sisi", arr_str1,arr_int1,arr_str2,arr_int2) ??
Infact if someone could in detail clarify the various PyArg_ParseTuple function that would be of great benefit. the Python C API, as i find it in the documentation, is not exactly a tutorial on things to do.
You can use PyArg_ParseTuple to parse a real tuple, that has a fixed structure. Especially the number of items in the subtuples cannot change.
As the 2.7.5 documentation says, your format "(s#:):#(i:)#(s#:):#(i:)#" is wrong since : cannot occur in nested parenthesis. The format "(sss)(iii)(sss)(iii)", along with total of 12 pointer arguments should match your arguments. Likewise for Py_BuildValue you can use the same format string (which creates 4 tuples within 1 tuple), or "(sss)[iii](sss)[iii]" if the type matters (this makes the integers to be in lists instead of tuples).
I have a little project that works beautifully with SWIG. In particular, some of my functions return std::vectors, which get translated to tuples in Python. Now, I do a lot of numerics, so I just have SWIG convert these to numpy arrays after they're returned from the c++ code. To do this, I use something like the following in SWIG.
%feature("pythonappend") My::Cool::Namespace::Data() const %{ if isinstance(val, tuple) : val = numpy.array(val) %}
(Actually, there are several functions named Data, some of which return floats, which is why I check that val is actually a tuple.) This works just beautifully.
But, I'd also like to use the -builtin flag that's now available. Calls to these Data functions are rare and mostly interactive, so their slowness is not a problem, but there are other slow loops that speed up significantly with the builtin option.
The problem is that when I use that flag, the pythonappend feature is silently ignored. Now, Data just returns a tuple again. Is there any way I could still return numpy arrays? I tried using typemaps, but it turned into a giant mess.
Edit:
Borealid has answered the question very nicely. Just for completeness, I include a couple related but subtly different typemaps that I need because I return by const reference and I use vectors of vectors (don't start!). These are different enough that I wouldn't want anyone else stumbling around trying to figure out the minor differences.
%typemap(out) std::vector<int>& {
npy_intp result_size = $1->size();
npy_intp dims[1] = { result_size };
PyArrayObject* npy_arr = (PyArrayObject*)PyArray_SimpleNew(1, dims, NPY_INT);
int* dat = (int*) PyArray_DATA(npy_arr);
for (size_t i = 0; i < result_size; ++i) { dat[i] = (*$1)[i]; }
$result = PyArray_Return(npy_arr);
}
%typemap(out) std::vector<std::vector<int> >& {
npy_intp result_size = $1->size();
npy_intp result_size2 = (result_size>0 ? (*$1)[0].size() : 0);
npy_intp dims[2] = { result_size, result_size2 };
PyArrayObject* npy_arr = (PyArrayObject*)PyArray_SimpleNew(2, dims, NPY_INT);
int* dat = (int*) PyArray_DATA(npy_arr);
for (size_t i = 0; i < result_size; ++i) { for (size_t j = 0; j < result_size2; ++j) { dat[i*result_size2+j] = (*$1)[i][j]; } }
$result = PyArray_Return(npy_arr);
}
Edit 2:
Though not quite what I was looking for, similar problems may also be solved using #MONK's approach (explained here).
I agree with you that using typemap gets a little messy, but it is the right way to accomplish this task. You are also right that the SWIG documentation does not directly say that %pythonappend is incompatible with -builtin, but it is strongly implied: %pythonappend adds to the Python proxy class, and the Python proxy class does not exist at all in conjunction with the -builtin flag.
Before, what you were doing was having SWIG convert the C++ std::vector objects into Python tuples, and then passing those tuples back down to numpy - where they were converted again.
What you really want to do is convert them once, at the C level.
Here's some code which will turn all std::vector<int> objects into NumPy integer arrays:
%{
#include "numpy/arrayobject.h"
%}
%init %{
import_array();
%}
%typemap(out) std::vector<int> {
npy_intp result_size = $1.size();
npy_intp dims[1] = { result_size };
PyArrayObject* npy_arr = (PyArrayObject*)PyArray_SimpleNew(1, dims, NPY_INT);
int* dat = (int*) PyArray_DATA(npy_arr);
for (size_t i = 0; i < result_size; ++i) {
dat[i] = $1[i];
}
$result = PyArray_Return(npy_arr);
}
This uses the C-level numpy functions to construct and return an array. In order, it:
Ensures NumPy's arrayobject.h file is included in the C++ output file
Causes import_array to be called when the Python module is loaded (otherwise, all NumPy methods will segfault)
Maps any returns of std::vector<int> into NumPy arrays with a typemap
This code should be placed before you %import the headers which contain the functions returning std::vector<int>. Other than that restriction, it's entirely self-contained, so it shouldn't add too much subjective "mess" to your codebase.
If you need other vector types, you can just change the NPY_INT and all the int* and int bits, otherwise duplicating the function above.